"HTML 4.01 Specification",
D. Raggett, A. Le Hors, I. Jacobs,
Editors. World Wide Web Consortium (W3C) Recommendation,
24 December 1999. This version of the HTML 4.01 Specification is available at http://www.w3.org/TR/1999/REC-html401-19991224.
The latest version of the HTML 4.01 Specification is always available at http://www.w3.org/TR/html401/.
Lloyd, Ian.
Build Your Own Web Site The Right Way Using HTML & CSS.
United States of America:
SitePoint,
2006.
Print.
Pixels are especially common because sizing is not browser-dependent
Downside: cannot be resized on older browsers, which presents an accessibility issue with text
Ems - em
Originally (in typography) one em was the size of the capital letter M.
In CSS an em is essentially a percentage, where 1em means 100% of regular font size
Exs - ex
Units most commonly used with borders, magins, etc.:
Picas - pc (= 12pts)
Points - pts (=1/72 in)
Inches - in
Centimeters - cm
Millimeters - mm
Note: above units sometimes used with text on printer style sheets
Block-level vs. Inline Elements
Inline elements
Can never contain block-level elements
Can contain other inline elements
Examples:
span
em
strong
cite
a
img (even though it is an empty tag)
Styling options:
Text and background colors
Any and all font properties
display: block; allows block-level formatting to be applied to inline elevements
Block-level Elements
Create breaks before and after
Can contain both other block-level elements and/or inline elements
Examples:
h1, h2, h3, etc.
p
div
blockquote
ul and ol
form
Styling options:
Setting a fixed width or height for a block of text
Setting a padding effect
Moving to any position on a web page, regardless of the position in which the markup appears
Styling Text (Primarily...)
Font Properties
font-style
italic
oblique (same as italic)
normal (to reset an inherited font-style)
font-variant
small-caps
normal (to reset an inherited font-variant)
font-weight
Keyword properties
bold
bolder
lighter
normal (to reset an inherited font-weight)
Can also use numeric values
Range: 100 to 900, in increments of 100
normal = 400, and bold = 700
font-size
Available units:
All measurement units, as listed in the Size section
Note 1: Standard font size for non-headings text is generally 16 pixels
Note 2: Unless the user has tweaked his browser's default font settings, 2.4em should work out to be 24 pixels tall
Absolute size: 7 values, from xx-small through xx-large
Realitive size: larger or smaller (2 values only)
Percentages
Note 1 (on using percentages and ems):
Because of inheritence, setting font-size to, say, 90% or .9em when an ancestor already has the same style will result in a net font-size of 81% or .81em.
This especially arises when using percentages and ems with nested lists. Without taking care with selectors, texts in the nexted lists will automatically change exponentially.
Note 2 (a trick with percentages):
Since base font is 16 pixels, setting body {font-size: 62.5%;} changes the base font to 10 pixels
It then becomes much easier to work with ems or multiples to adjust font sizes
font-family
Can declare multiple, comma-separated values
Common combinations:
Arial, Helvitica, sans-serif
"Times New Roman", Times, serif
"Courier New", Courier, monospace
Georgia, "Times New Roman", Times, serif
Verdana, Arial, Helvitica, sans-serif
Geneva, Arial, Helvitica, sans-serif
Tahoma, "Lucinda Grande", Arial, sans-serif
"Lucida Console", Monaco, monospace
"Marker Felt", "Comic Sans MS", fantasy
"Century Gothic", "Gill Sans", Arial, sans-serif
Shorthand notation - i.e., font:
Order of values:
font-style
font-variant
font-weight
font-size
line-height (preceeded by a "/")
font-family
Except for multiple font-family values, other values not comma-separated
Flannagan, David.
JavaScript: The Definitive Guide. 5th Ed.
United States of America:
O'Reilly,
2006.
Print.
Powers, Shelley.
Learning Javascript.
United States of America:
O'Reilly Media, Inc.,
2007.
Print. This is a perfectly dreadful book. See my review of this montstrosity on amazon.com.
Problem: there is no "public registry" to make them unique
Practice: include a URI as a "backup" public system identifier
Syntax: PUBLIC "unique public identifier"
"back-up public identifier"
Example: (inside a full document type declaration) <!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN"
"http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd">
Declarations (general)
Pieces of information needed to assemble & validate XML document
External declarations: provided by DTD Identifier
Internal declarations: those contained within the square brackets of a document type declaration
External declarations are read before internal ones
Various kinds - for validation, defining entities, style sheets, etc.
Entity names can be thought of as either shorthand notation or as placeholders
Named value
Can be anything from a single character to an entire file
A piece of XML that can be inserted anywhere inside the document
Note – syntax for using an entity reference: &EntityName;
Entities can be nested - i.e., a referenced entity can itself contain references to other entities
Unless predefined all entities must be declared before being referenced
Note: with nested entity declarations one can put the parser in an infinite loop if such declarations do not appear in the proper order (i.e., before the relevant entity reference)
Mixed-content (i.e., anything beyond a single-character entity)
General
No limit on length
Can include both markup and text/content
Internal
Replacement text defined in the entity declaration
Generally used for oft-repeated words & phrases
Can improve 'quality control'
As a rule do not use any of the 5 predefined entity references inside the entity value - i.e., simply include, say, the < or > characters themselves
Exception: If entity value is delimited with single quotation marks then must use < to escape any single quotation marks within the entity value itself. (Same logic & approach for double quotation marks.)
External
Replacement text in an external file
Useful for large amounts of text
Useful where entity references may appear in multiple documents
Useful for 'breaking down' large documents into component parts
Best Practice: if using to create a master document/sub-documents arrangement have sub-documents contain only 1 document element of its own
External text is inserted at the time of parsing
Unparsed
An entity that is neither text nor XML - e.g., an image
Can only be used from within an attribute value
Syntax
Example:<!ENTITY mypic SYSTEM "images/somepic.gif" NDATA GIF>
NDATA keyword indicates that entity is not to be parsed
Notation identifier specifies data format
Syntax:
<!ENTITY someEntityName someIdentifierOrValue>
Identifier can be either a system or a public one
Syntax for the identifier portion of an entity declaration: same as that used for DTD Identifiers
3 types of identifiers
System
Public
Explicit
Examples:
With SYSTEM identifier: <!ENTITY xmlKb SYSTEM "myXmlKnowledgeBase.xml">
With explicit identifier: <!ENTITY crSpan "<span property="dcterms:creator">">
Document element (aka, the "root element")
XML document = a collection of nested elements
XML document must have one - but only one - document/root element
Multiple files can comprise an XML document
Implication: an XML document is a logical, not a physical, entity
Any one element can contain multiple, whitespace-delimited namespace declarations
Syntax
When including a namespace prefix (usually a short abbreviation of some sort): xmlns:someNsPrefix="someURI"
When omitting a namespace prefix (thus making the declared namespace the default namespace): xmlns="someURI"
Note: Using xml as a namespace prefix will lead to tears
Prefixes
Advantages
As tokens or replacements for the full namespace prefixes are much shorter and more readable
Namespaces may well contain characters not permitted in XML names
Names governed by rules for XML non-colonized names
Common namespace prefixes
xsd & xs for XSDL, the XML Schema namespace
xsl for Extensible Stylesheet Language
xsi for the XML Schema Instance namespace
xlink for XLink
Default namespace
You are allowed one
Define by omitting a prefix
Default namespace declarations do not apply to attributes
Empty string namespaces
You can map the default namespace to an empty string
Rationale: you want to disable an inherited default namespace declaration
You cannot, however, map prefixes to an empty string
URIs in namespace declarations
Common Practice: use URL subset of URIs
Drawback (of using URLs): URLs can change
In Practice: XML does not, in fact, actually look up the information at the URL
purl.org
purl - acronym for permanent uniform resource locator
One may register a URL on purl.org and then have the purl.org URL 'point to' the site where a resource is actually located
One may then move the resource, but so long as the pointer is updated within purl.org's account information the purl.org-URL will continue to function without any change noticed by end-users
purl.org URLs used in a number of notable semantic web vocabularies
Including a namespace element or attribute
When referring to the default namespace - essentially, nothing needs to be done
A fully qualified name
Must be used for namespaces other than the default namespace
Syntax:namespacePrefix:namespaceLocalName
Prefix assigned to a namespace can be overridden with a new namespace declaration
Overriding a previously-declared namespace: <desert:name
xlmns:desert="http://njdiners.org/deserts/">
apple pie
<desert:presentation
xlmns:desert="http://fauxfrenchterms.org/cooking/">
a la mode
</desert:presentation>
</desert:name>
Namespaces & validating XML
Generally XML with non-implicit namespaces will fail to validate
Use/existence of multiple namespaces in a document violate the constraints of the document's DTD
The XML namespace itself
Whitespace
Many parsers will normalize whitespace
Strips out whitespace in element-only content
Removes whitespace from the beginning & end of mixed content
Collapses a sequence of whitespace characters to a single space
xml:space attribute
setting value to preserve (i.e., xml:space="preserve") is supposed to (obviously) preserve whitespace
Note: the xml prefix does not require an explicit namespace declaration
Conceptual organization of XML elements
Trees
XML documents have an upside-down tree structure
Branches & leaves
Genealogy
Parent & child relationships
Ancestor & descendants
Siblings
Declaration
Stylesheet declaration
CSS
Others (esp. XSLT)
Processing Instruction
Best Practice: avoid processing instructions wherever possible
2 components
Target keyword
Data
Syntax
<?someKeyword someData?>
Neither target keyword nor data can contain the closing delimiter (i.e., ?>)
A data string is optional (data string typically omitted when inserting, say, line- or page-break instructions)
Target = keyword designed to be recognized by a specific XML processor
Comment
Syntax
Same as HTML - i.e., <!-- someComment -->
Comment itself cannot contain two consecutive dashes (this implies that you cannot nest comments)
Placement
Cannot appear before XML declaration line
Cannot appear inside elements
Can appear inside element content
CDATA section
Essentially tells parser that there is no markup in the section
Among the optional elements #PCDATAmust be listed first
Asterisk (*) is required - it indicates zero or more of whatever came before
Cannot replace asterisk(*) with a plus (+) to require that an element with mixed content contain character data
Attributes
Basic syntax (to define the attribute(s) of a given element):
<!ATTLIST elementname
attributename1 attributeTypeKeyword requirementKeyword
attributename2 attributeTypeKeyword requirementKeyword
…
Attribute type keywords
CDATA - character data
ID - unique ID
Parsers will check for uniqueness, but only within a given document
Uniqueness will be enforced across all element names
As with XML names, an ID value may only begin with a letter or underscore
IDREF, IDREFS - the ID(s) of (an)other element(s)
NMTOKEN, NMTOKENS - (a) valid XML name(s)
ENTITY, ENTITIES
NOTATION
xml: - a predefined xml value
Attribute requirement keywords
#REQUIRED
#IMPLIED
Means that there is an enumerated list of usable/valid values
The parenthetical, vertical-bar-delimited set of allowable values must immediately precede the #IMPLIED keyword
Note 1: when working with prefixed attributes (e.g., XLINK-) where the namespace defines available choices, use #IMPLIED as a stand-alone keyword
Note 2: means the attribute is not required
#FIXED
#FIXED keyword followed by some value delimited in quotes
Attribute must/can only take the value inside the quotation marks
Parameter entities
General
Note: parameter entities apply only in DTDs
Represent/hold recurring parts of declarations
In external declarations can be used to hold
Element groups
Content models
Attribute definitions for attribute list declarations
In internal subset can only hold complete declarations, not fragments
Syntax
They are preceded by a percentage sign (%) both when declared and referenced
Example (declaration): <!ENTITY % common.atts "
id ID #IMPLIED
class CDATA #IMPLIED"
>
Note: modules can be "turned off" for debugging purposes
Group declarations by function
By block & inline
By hierarchical elements
By tables, lists, etc.
Whitespace (use liberally)
Comments (use liberally)
Version tracking - use!!
Attributes versus elements
Keep elements specific
Use a hierarchical structure
Elements should hold content that is part of the document
Attributes should modify the behavior of an element
Modularization (2 methods)
Import modules from external sources
Drawbacks
Referencing an external parameter entity incorporates the entire file
There is no concept of scope when it comes to DTDs - i.e., local declarations do not override declarations in an externally references parameter entity
Drawback workarounds
In the internal subset "predeclare" an entity to override it (1st declaration takes precedence)
This works because the internal subset is read before external subset, & 1st declarations take precedence
This only works for attribute declarations, however; it is a validity error to declare an element more than once
Use conditional sections to override an external element declaration
Conditional sections
Can only be used with external subsets
Syntax is exactly like that of CDATA sections
Two keywords are INCLUDE and IGNORE (used in place of the CDATA keyword)
The 'trick' to conditional sections is to set the INCLUDE/IGNORE keyword with a parameter entity
"…there is only one possible tree configuration for any given document…" (Ray, p. 100)
Ergo, there is a unique path from document root to any point inside document
XPath describes that route
Node types
Root
Special kind of node - it is not an element!
Contains:
Document element
Any comments &/or processing instructions surrounding document elemtn
Element
Root and elements are the only node which can contain other nodes
Element can contain any other type of node except the root node
An empty element is called a leaf node
Attribute
Treating as a node allows you to access an element's attribute(s)
Like an element node which contains only text
Text
When uninterrupted treated as a leaf node
Always the child of an element node
Note: an element can contain more than one text node; text can be broken up by elements or other node types
Comment
Processing instruction
Namespace
Note that a namespace declaration is not treated as simply another attribute node
Because namespace declarations affect all child elements, a namespace is a region of a document, not of an element
Notes on DTDs:
DTDs are not treated as node types!
XPath assumes all entity references are resolved prior to XPath looking in document
This approach is required since entities might resolve to XML mark-up
Trees & subtrees
Think of document as collection of subtrees
Note: a node tree must contain all appropriate opening & closing XML tags
Finding nodes
General
A location path represents a chain of location steps which allow you to move around in a document
3 parts of a location step
axis - the direction to travel
node test - specifies what kind of nodes to travel through
predicates - optional boolean tests
Context node = 'current' node
Node axes
Keywords
Ancestor - All ancestors of context node
Ancestor-or-self
Attribute - attributes of the context node
Child - children of the context node
Descendent - all generations
Descendent-or-self
Following
Nodes that follow at any level
Does not include context node's own descendents
Does include descendents of other (following) nodes
Following-sibling
Like following, but does not include any descendant nodes
Only includes sibling-level (i.e., having same parent) nodes
Namespace - all the namespace nodes of an element
Parent
Preceding
Does not include descendents of the context node
Does include prior siblings and their descendents
Preceding-sibling - no descendents
Self - i.e., the context node
Syntax
axisname::nodetest[predicate]
More often than not…
Simply use the name of a node
Node type is then inferred from the axis
Within the attribute axis node is assumed to be an attribute
Within the namespace axis node is assumed to be a namespace
In all other cases node is assumed to be an element
"In the absence of a node axis specifier, the axis is assumed to be child and the node is assumed to be of type element." [Emphasis mine - WHW] (Ray, p. 100)
Node tests
/
Again, not synonymous with the root element
Contains the root element plus all comments or processing instructions preceding root element
node()
Matches any node
Example:attribute::node() selects all attributes of the context node
*
In the attribute axis any attribute
In the namespace axis any namespace
In all other axes any element
foo
In the attribute axis the foo attribute of the context node
In the namespace axis the foo namespace
In all other cases the element foo
text() - any text node
processing-instruction() - any processing instruction
processing-instruction('for-web') - any processing instruction with a for-web target
comment() - any comment node
Location path shortcuts
//
Starting at the root node - i.e., anywhere in the document
//foo ≡ any foo element anywhere in the document
.//
Selects all matching nodes which are descendents of context node, regardless of how many generations below
≡ /descendent-or-self::node()//
. - the current (aka context) node, ≡ self::node()
.. - parent of the current node
@
Use to select attributes
E.g., @foo ≡ attribute::foo
/* - the document element
Remember that / is an absolute path & matches the root node
* then matches any element, or, in this case the document element
../*
Matches all sibling elements
Also includes current element if context node is an element
¦ - use to join together paths as an AND statement
Predicates
A boolean expression enclosed within square brackets (i.e., [ ])
"Extensible Markup Language (XML) 1.0.",
T. Bray, J. Paoli, C. M. Sperberg-McQueen, E. Maler, F. Yergeau,
Editors. World Wide Web Consortium (W3C),
26 November 2008. This version of the XML 1.0 Recommendation is available at http://www.w3.org/TR/2008/REC-rdfa-syntax-20081014/. The latest version
of this Recommendation is always available at http://www.w3.org/TR/xml/.
Goldberg, Kevin Howard.
Visual QuickStart Guide: XML. 2nd Ed.
United States of America:
Peachpit Press,
2009.
Print.
Ray, Erik T..
Learning XML. 2nd Ed.
United States of America:
O'Reilly,
2003.
Print.
Walmsley, Priscilla.
Definitive XML SchemaUpper Saddle River,
NJ07458:
Prentice Hall PTR,
2002.
Print.
Watt, Andrew H..
Sams Teach Yourself XML in 10 MinutesUnited States of America:
Sams Publishing,
2003.
Print.
All prefixes, local names, & unprefixed names are NCNames
Scope of namespace declarations
Applies to children, grandkids, etc.
Standard Practice: place namespace declarations in your root element's start tag
Overriding namespace declarations (can be done)
Attributes & namespaces
"Attributes with prefixed names are also known as global attributes." (Walmsley, p. 45)
Un-prefixed attribute names
Attributes are not affected or governed by default namespaces
Therefore, un-prefixed attributes are technically not in any namespace
(It can be argued that un-prefixed attributes are in the namespace of the containing element)
Relationship between namespaces & schemas
Many-to-many relationship
A namespace can be defined by 0+ schemas
Multiple schemas with overlapping declarations within a single namespace
In & of itself this is perfectly legit
The restriction is that a single instance cannot access these overlapping schemas simultaneously (at least it certainly cannot incorporate a declaration that can be found in more than 1 referenced schema)
A single schema can declare names for 0+ namespaces
Using namespaces in XSDL
Target namespaces
There can be only 1 target namespace defined by a schema
Every global declaration in a schema is associated with that target namespace
Must appear at the beginning (as is the case with import & redefine)
When using an include 1 of the following must hold:
Container & contained schema have same target namespace
Neither schema document has a target namespace
Container schema document has a target namespace while included schema document does not
"…all components of the included schema document take on the namespace of the including schema document." (Walmsley, p. 67)
The included components are called chameleon components, because their operative namespace depends on the document into which their own document is included
Can employ multiple includes in a schema document
Can also have "nested" includes
redefine
Similar to include, yet allows you to, well, redefine select components in the included schema document
Used when referring to components from other namespaces
Difference vs. include
include is typically used to pull together schema documents that were designed to be used in tandem
Import is used to "record a dependency on another namespace, not necessarily another schema document." (Walmsley, p. 70)
Attributes (all optional)
id
namespace
The namespace being imported
Imported namespace cannot be the same as the importing namespace
Do not specify this if the components being imported are not in a namespace
If, however, the importing schema document itself has no target namespace then it can only import schema documents which themselves have a namespace attribute
schemaLocation
As with include, if you include this attribute it must be a resolvable reference
On the other hand the processor is not required to attempt to resolve it
You can have multiple imports of the same namespace
Looping references among imported schema documents
Declare an attribute whose type has a restriction element
base attribute of that restriction element has a value "xsd:NOTATION"
The restriction element contains xsd:enumeration elements
The value attribute of each xsd:enumeration element refers to the name of a globally-declared xsd:notation element
Note: Values of "xsd:NOTATION"-based attributes are qualified names. I.e., must either be:
Prefixed
In the scope of the default namespace declaration
Notations & unparsed entities
Notations can be used to identify format of an unparsed general entity (e.g., graphics data embedded directly in XML instance)
See Walmsley, Example 6016, p. 115
Schema Components - Basic
Element Declarations
Global & local element declarations
Global element declarations
These must be directect children of schema element
As such, they appear at the top of the schema doc
Attributes (only name is a required attribute) & attribute type
id (ID)
name (NCName)
type (QName) - to specify data type
default (string) - default value
fixed (string) - fixed value
nillible (boolean)
Controls whether xsi:nil can be used in the instance
Default = "false"
abstract (boolean) - default = "false"
substituionGroup (QName)
block - allowable values:
"#all", or:
List of: "substituion", "extension", or "restriction"
Defaults to blockDefault of schema
final - allowable values:
"#all", or:
List of: "extension" or "restriction"
Defaults to finalDefault of schema
Allowable content
annotation?
(simpleType ¦ complexType)?
(key ¦ keyref ¦ unique)*
Element constraints - i.e., minOccurs & maxOccurs
Place in element reference
These do not occur in element declarations
Local element declarations
Appear entirely within complex type definitions
Substituion-related attributes are not valid within local element declarations:
substitutionGroup
final
abstract
Can declare as children elements of:
all
choice
sequence
Attributes (only name is a required attribute) & attribute type
id (ID)
name (NCName)
form
Use to set whether element-type name must be qualified in the instance
Allowable values
"qualified"
"unqualified"
Default value
Value of elementFormDefault of schema
Note: that attribute in turn defaults to "unqualified"
type (QName) - to specify data type
minOccurs (nonNegativeInteger) - defaults to "1"
maxOccurs (nonNegativeInteger)
Can also use a value of quot;unbounded"
Defaults to "1"
default (string) - default value
fixed (string) - fixed value
nillible (boolean)
Controls whether xsi:nil can be used in the instance
Default = "false"
block - allowable values:
"#all", or:
List of: "substituion", "extension", or "restriction"
Defaults to blockDefault of schema
Allowable content
annotation?
(simpleType ¦ complexType)?
(key ¦ keyref ¦ unique)*
Using global vs. local element declarations
Use global element declarations when:
"The element declaration could ever apply to the root element during validation." (Walmsley, p. 125)
You want to use the element across complex types
You want to use the element in a substituion group
Use local element declarations when:
"You want to allow unqualified element-type names in the instance." (Walmsley, p. 126)
In this case the only global element you should have would be the root element
Reminder: In an instance global element-type names are always qualified
You want to (re-)use an element name in different locations, varying the element by data type or by some other property
E.g., if making a schema that will include shirts & pants might want to create a size element that can be used (appropriately) in both
Would not create a global size element here because sizes for shirts might be S, M, L, XL, etc., while for pants would be 32, 34, 36, etc.
Declaring the data types of elements (3 options)
Employ the type attribute in your element declaration
Must set value to a named data type
Value can either be a user-derived type or a built-in type
Define an anonymous type - i.e., specify either a simpleType or complexType child
Do neither of the above - this 'defaults' to an anyType data type
Content can then be children and/or content
Element can also then take any attributes
Default & fixed values
General
Pertains to treatment of empty elements
Schema will insert default or fixed values
Differing treatment of elements vs. attributes with default or fixed values
If an attribute with a default or fixed value is not even present the schema processor will effectively add the attribute (& value) to the appropriate element
With elements nothing happens if the element itself is not present
Again, default & fixed values only come into play with ( present, but) empty elements
With elements the default & fixed attributes are mutually exclusive
Elements in which default & fixed attributes may be used:
Simple types
Complex types with simple content
Complex types with mixed content but only if all children are optional
Specification of default & fixed values is independent of occurrence constraints
If an element has minOccurs > 1 you can leave the element empty & allow element content to be populated by parser from fixed or default values
Default values
As a reminder, default values come into play only where an element has empty content
While empty elements with default values will have their content populated, certain data types allow an empty data string:
string
normalizedString
token
Any user-defined types that derive from one of these types & which allows an empty string
Unrestricted list types also allow empty values
Issues with empty elements which allow an empty string
If element has xsi:nil attribute set to "true" the default value will not be inserted
Otherwise default value will be inserted
Whitespace
Whitespace is typically not synonymous with an empty string
For string data, then, whitespace sill not be replaced with default values
For data types like integer, however
whitespace facet of this data type is collapse
Whitespace handling occurs before default-processing handling
Therefore, when an integer-type element with a default value is populated only with whitespace the element will be populated with the default whitepace Note : xx xx xx xx
Again, this differs from a string-type element, where whitespace would be read as valid content
Fixed values
Somewhat suprisingly, you may want to supply content to an element even where the element has a fixed value
Example: you have an element with an integer data type and a fixed value of 1
Must prefix the namespace (generally with xlink) when declaring it
The namespace declares several global attributes (as global, these attributes must be prefixed):
Functional
type
href
show
actuate
label
Not synonymous with id attribute
Specifically, different XLink elements can share the same label value
from
to
Semantic
role - indicates a property that the entire link has
arcrole
title - human-readable description of an entire length
'Defining' an XLink element
XLink 1.0: this occurs with the presence of an xlink:type attribute
XLink 1.1: including either xlink:type or xlink:href identifies an element as an XLink element
Available values (XLink 1.1) for the xlink:type attribute
"simple"
"extended"
"locator"
"arc"
"resource"
"title"
Note: if an XML element contains an xlink:href attribute but no xlink:type attribute then the element is treated as if it has an xlink:type attribute with a value of "simple"
All other attributes & elements defined in the namespace are reserved
XLink Attribute Usage Patterns
Note: an XML element with, say, an attribute/value of xlink:type="resource" is referred to as a resource-type element
Value chosen (or implied) for type attributes drives which other XLink attributes can or must be used
Unless flagged as mandatory these attributes are those which can be used within a given XLink-type element:
simple-type element:
href
role
arcrole
title
show
actuate
extended-type element
role
title
locator-type element
href (required)
role
title
label
arc-type element
arcrole
title
show
actuate
from
to
resource-type element
role
title
label
resource-type element: no other attributes (other than type) can be used
XLink Element Type Relationships (i.e., which other XLink element types a given XLink element type can have as children)
simple-type element (no XLink children allowed):
extended-type element can have the following XLink elements
locator
arc
resource
title
locator-type element can have only a title-type XLink element as a child
arc-type element can have only a title-type XLink element as a child
resource-type element (no XLink children allowed):
title-type element (no XLink children allowed):
Attribute Value Defaulting
With XLink it's easy to wind up dealing with a large number of attributes
Where appropriate you may supply default values in a DTD or a schema
With DTDs be careful about putting these default values in an external subset - not all browsers will read that info!
Not all schema languages allow you to specify default attribute values
Integrating XLink with Other Markup
Essentially there are no issues
If so inclined you can use DTDs or Schema to tighten, say, which XLink elements can be children of which XLink parent elements
Integrating XLink with Legacy Markup
XLink requires the use of prefixed attributes
Therefore, XHTML 1.0's a element is not a conforming XLink construct, despite having an href attribute
XLink Elements & Attributes
General (Two Kinds of Links)
Extended links
Full XLink functionality
Example: inbound links, third-party arcs, & arbitrary number of participating resources
Can point to remote resources, specify arc traversal rules, etc.
Structure can get fairly complex
Simple links
Used for an outbound link with 2 participating resources
One resource must be local, the other remote
As an outbound link, the arc must go from local to remote
Same functionality as HTML A & IMG elements
Extended Links (extended-Type Element)
General
Definition
A link that associates an arbitrary number of resources
Those resources can be any combination of local & remote
Characteristics
The only link that can have inbound & third-party arcs
Note: an extended link can itself contain a local resource
Can have fewer than 2 resources
This obviously makes the link un-traversable, but it is not an error
Might do this to associate properties with a single resource
Might also create as a to-be-populated placeholder
Deployment
"Typically, extended linking elements are stored separately from the resources they associate (for example, in entirely different documents)." (W3 Recommendation, section 5.1)
This deployment patterns makes extended links useful in following situations:
Participating resources are read-only
Easier/cheaper to modify a separate linking element than it is to modify participating resources themselves
Resources have no native support for embedded links (e.g., media formats)
Markup
Can place neither extended- nor simple-type element inside a parent extended-type element
Sub-elements of locator-, arc- or resource-type elements must be direct children of an extended-type element
Allowable attributes (both semantic) of an extended-type element:
Traversal rules for an extended link (arc-type element)
Content
Allowable
Only title-type element has any XLink significance
Allowable attributes
Traversal
from
to
These 2 attributes take as their values the label values of appropriate resources
Further, these values specify how the link may start & end
Omitting values for a from or a to attribute
arc attribute looks to all labels in the locator-type elements of the enclosing extended-type link
The empty to and/or from attributes then default to referring to that whole collection of labels
Omitting values for both attributes means that you get n-squared traversals, where n = number of locator-type elements
Omitting values for one or both attributes can/will mean that a resources wind up having themselves as both source & destination - this is not an error
When multiple resources share the same label (permitted)
These are understood to be individual resources, not a single resource in aggregate
Using multiple-resource labels as values for from and/or to attributes results in a Cartesian-product set of traversals
Not identifying all possible traversal pairs
Certainly allowable
XLink does not impute anything here
May see cases where a higher-level app does provide some default behavior
How an XLink app is supposed to handle a linkbase arc:
Keep track of starting resource
After traversing a link to a linkbase app should extract any extended links from inside linkbase
Assuming extracted portion is a valid XML file (or portion thereof) app should make available those extended links completely contained inside that extracted portion
Timing of linkbase arc traversal - depends on value of actuate attribute
"onLoad": linkbase is loaded & links extracted when starting resource is loaded
Note:"show" value must be ignored because presentation in this case is non-sensical
Chaining linkbases
Can be done
Consuming app may chose to limit the number of steps it will process in a linkbase chain
Misc. notes about consuming apps
Should maintain a list of extended links retrieved after processing a linkbase
Should not retrieve duplicate resources or links (this would arise in cases of cyclic dependencies)
Simple Links (simple-Type Element)
Designed to provide shorthand for uncomplicated, outbound extended links
Essentially rolls the following elements into one:
extended-type
locator-type
arc-type (direction implied)
resource-type
Traversal arc is outbound, by implication
Abilities you 'lose' with simple links:
Supplying more than 1 remote & local resource (each)
Creating in-bound arcs
Specifying a title with the arc
Specifying a role or title with the local resource
Specifying a role or title with the link as a whole
Content
Allowable
"The simple-type element itself, together with all of its content, is the local resource of the link, as if the element were a resource-type element." (W3 standard, 5.2 Simple Links (simple-Type Element)
Absence of content
Allowable
Typically this is seen where it assumed consuming app will provide a way to give the user a way to traverse the link
Note: use of the href attribute is not required
Obviously link is then un-traversable
Might want to employ this construct if all you want to do is to associate properties with a resource
XLink Element Type Attribute (type)
Again, this element determines the type of the link
Constraint: type Value
Except in case of simple-type element 'defined' by existence of href attribute, a value for the type attribute must be supplied
Note: you can supply "none" as a value.
If you do so the element has no XLink significance.
Might do this if you want an XLink app not to bother checking for the presence of an href attribute
Locator Attribute (href)
May be used on simple-type elements
Must be used with locator-type elements
Note: both XLink 1.0 and 1.1 are explicit about not making any syntactical demands of the value of the href attribute
Can use relative references
Can use fragment identifiers
Semantic Attributes (role, arcrole, & title)
The values supplied for role or arcrole attributes cannot be relative
Where the title attribute is used its value should be a descriptive, human readable string
Behavior Attributes (show & actuate)
General
The two attributes are never required
With arc-type elements in linkbase lists apps must assume:
show="none"
actuate="onLoad"
This applies even if different values were in fact specified
show attribute
Used to specify how you want your ending resource to be presented
Constraint: show values (required)
"new" - i.e., new window, frame, etc.
"replace" - i.e., in existing window, frame, etc.
"embed"
Assuming that the starting link is not the entire document, this is akin to loading a *.gif file in an HTML page
The presentation of embedded resources is determined by the consuming app
"other" - use this if your directions on how to present the resource are placed elsewhere in your markup
"none"
actuate attribute
Use to set the timing of arc traversal
Constraint: actuate values (required)
These values give direction on how a consuming XLink app should behave
"onLoad" - i.e., traverse the arc as soon as the starting resource is loaded
"onRequest"
Could set trigger to be user clicking on starting resource
Could set trigger to be a timer countdown
"other" - use when your directions are provided elsewhere in the markup
"XML Linking Language (XLink) Version 1.1 Recommendation",
S. DeRose, E. Maler, D. Orchard, N. Walsh,
Editors. World Wide Web Consortium (W3C) Recommendation,
06 May 2010. This version of the XLink 1.1 Recommendation is available at http://www.w3.org/TR/2010/REC-xlink11-20100506.
The latest version of the XLink Recommendation is always available at http://www.w3.org/TR/xlink11/.
Syntax/Example (for embedding the instructions within the XML document): <?xml version="1.0"?>
<?xml-stylesheet href="docbook/html/docbook.xsl"
type="text/xsl"?>
<?xml-stylesheet href="docbook/wap/docbook.xsl"
type="text/xsl" media="wap"?>
Advantage of embedding in this fashion is that, as here, can specify different stylesheets to be used depending on type of browser or client type
Disadvantage: generally want to avoid embedding processing instructions within data document
Parsing the stylesheet - processor first reads the stylesheet
Parsing the transformee - processor builds a tree view of xml document
Processing steps
Set any specified properties (usually a stylesheet specifies an output method)
See if there are any nodes (in transformee document) to process
This is always determined by the context - & context changes constantly
At the start the context is always the root of the XML document
Very first step is simply to identify the next node
Once next node is found
See if any <xsl:template> elements match that node
The element <xsl:template match="/"> matches the node - i.e., the first or top node in an xml document
Note: The <xsl:template match="/"> element is analagous to the main method in C# - i.e., a required starting point for the program
If a <xsl:template> matches the next node…
Process the contents of the <xsl:template> element
Typically…
The contents of a <xsl:template> element are at least one <xsl:apply-templates> element(s)
The value of the select attribute inside the <xsl:apply-templates> element correspond to the match attribute of another <xsl:template> element
Note: if more than one <xsl:template> matches the next node the processor utilizes whichever is more specific
If no template matches the processor applies some built-in rules
Recursion
Once processor finds the appropriate <xsl:template> element to match a document node that element may invoke another
Invoked <xsl:template> element can yet again invoke another, etc.
This is the recursion process
Note: if more than one document element matches a <xsl:template>:
The elements are processed in order
This is referred to as document order
Generating HTML documents
If (because!) you do not specify a namespace in your XSLT stylesheet for html elements, the processor passes through html tags unaltered
However, the processor will at least think of them as xml tags
Therefore, although html itself does not require this your style sheet must specify closing tags for elements like <p>, <br>, etc.
Think of each <xsl:template> element as analagous to a rule in CSS
Stylesheet structure
The <xsl:stylesheet> element
Must include the version attribute (either "1.0" or "2.0")
Must, of course, include the namespace declaration: xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
(Namespace declaration is the same for both versions of XSLT)
The <xsl:output> element
method attribute - values are:
"xml"
"html"
"text"
"xhtml" - XSLT 2.0 only
Based on value chosen there are other attributes which can be used in conjunction
The <xsl:template> element (with match attribute)
Built-in template rules
General
Essentially the rules allow you to reach nodes in a document without having to wade through each parent node
In other words, allows recursion to continue
Specified templates always override built-in templates, though
See Tidwell, p. 32, for the exact templates
For element & document nodes
For modes
For text & attribute nodes
For comment & processing instruction nodes - though, per Tidwell, this default template does nothing
For namespace nodes - though, per Tidwell, this default template does nothing
Top-level elements
General
These are any elements which go at the top of an XSLT stylesheet
Aka, any element which takes <xsl:stylesheet> as its parent
<xsl:include> & <xsl:import>
These are used to refer to other stylesheets
<xsl:import> element
Anything in imported stylesheet has lower priority than calling stylesheet
In object-oriented terms you are sub-classing the imported sheet
Can only appear in beginning of sheet - i.e., can only take <xsl:stylesheet> as its parent
<xsl:include> can be used anywhere in a stylesheet
<xsl:strip-space> & <xsl:preserve-space>
Both take elements attribute
Values of elements attribute are space-delimited list of elements to which the attribute applies
Can declare both elements so that you, say, strip space from all elements except those specified in <xsl:preserve-space>
<xsl:key> (similar to declaring keys in a database)
<xsl:variable>
Used for variable declaration
Any variable declared as a top-level element is global to the entire stylesheet
<xsl:param>
Others
Other approaches
There are typically a variety of ways to write a stylesheet while getting the same result/output
If you have header or footer information you probably want that inside the <xsl:template match="/"> element
Although you can often shorten/compact templates, it's often easier from standpoints of readability, debugging, & reuse to use multiple, simple templates
XPath
General
Nodes & trees
"…there is only one possible tree configuration for any given document…" (Ray, p. 100)
Ergo, there is a unique path from document root to any point inside document
XPath describes that route
Trees & subtrees
Think of document as collection of subtrees
Note: a node tree must contain all appropriate opening & closing XML tags
XPath & DOM
XPath is designed to be used only in XSLT
DOM is an API for any other progamming language (e.g., JS)
"XPath is designed to be used inside an attribute in an XML document." (Tidwell, p. 45)
XPath works with parsed version of an XML document
Enitity references are resolved by processor before XSLT instructions are applied
CDATA sections are converted to text
No way of using XPath to get pre-parsed info on XML document
XPath 2.0 vs. XPath 1.0
No functionality from 1.0 is lost in 2.0
XPath 2.0 built on concept of sequences rather than nodes
Sequences can contain nodes
Biggest difference between sequences & nodes is that sequences can contain atomic values
Atomic values
An xs:integer value of 444 is an atomic value
Atomic values are defined in XML schema
Atomic value is one which cannot be broken down into smaller parts
XPath 2.0 supports all XML schema built-in data types
Note:schema-aware processors will also recognize custom schema-defined types
The XPath data model
Node types
Root
Special kind of node - it is not an element!
Contains:
Document element
Any comments &/or processing instructions surrounding document element
Unlike all other nodes - it has no parent
Always has at least document element as a child
Using <xsl:value-of select="/" /> gives you the text of all the root node's descendants, mashed together without spaces in between
Element
Root & elements are the only nodes which can contain other nodes
An element can contain any other type of node except the root node
An empty element is called a leaf node
Getting the name of an element rather than its text content:
name() - returns the element name plus any namespace in effect
Other functions exist to provide local name & the namespace
Attribute
Treating as a node allows you to access an element's attribute(s)
Like an element node which contains only text
Complications with attribute nodes:
Parent/child relationship
An element node is the parent of its attribute nodes
However, an attribute node is not the child of its element node
(An element's children are its text, other elements, comments, etc.)
Attributes with DTD or schema-defined default values
These attributes do not have to appear in the XML document
What is more, XML parsers are not required to read an external DTD
In that case the attributes will not even exist
Usually must create a <xsl:if> or <xsl:choose> hack to handle this
<xml:lang> & <xml:space> attributes are unique
The value of these attributes apply to all child elements
However, XPath only sees these attributes within the element in which they are declared
"…the only attribute nodes an element node contains are those tagged in the document and those defined with a default value in the DTD." (Tidwell, p. 49)
Text
When uninterrupted treated as a leaf node
Always the child of an element node
Note 1: an element can contain more than one text node; text can be broken up by elements or other node types
Note 2: there are no such things as CDATA nodes; text inside a CDATA section is treated as a simple text node
Comment - the node has a value but no name
Processing instruction
Includes the string value
Can obtain its name (via name() function)
Namespace
A namespace declaration is not treated as simply another attribute node
Because namespace declarations affect all child elements, a namespace is a region of a document, not of an element
Namespace nodes are almost never used in XSLT stylesheets
Notes on DTDs:
DTDs are not treated as node types!
XPath assumes all entity references are resolved prior to XPath looking into a document
This approach is required since entities might resolve to XML mark-up
Node Tests
XPath 1.0 (4 node tests)
The 4 node tests (these look like functions):
node()
Matches any node
Example:attribute::node() selects all attributes of the context node
text() - any text node
comment() - any comment node
processing-instruction()
Obviously, returns any processing instruction
processing-instruction('for-web') - any processing instruction with a for-web target
Other ways to retrieve/navigate nodes
*
In the attribute axis any attribute (e.g., attribute::*)
In the namespace axis any namespace
In all other axes any element
Example 1:child::* will match all the element children of a node
Example 2:foo:* will match all the elements in the foo namespace
/
Again, not synonymous with the root element
Contains the root element plus all comments or processing instructions preceding root element
foo
In the attribute axis the foo attribute of the context node
In the namespace axis the foo namespace
In all other cases the element foo
XPath 2.0 node tests
Function-like tests
element()
Takes up to 2 attributes
Used without attributes it is the same as select="*" in XPath 1.0
Used with one attribute it finds elements by name - e.g., element(foo) returns all <foo> elments
Use with 2 attributes to search by element name & data type - e.g., element(birthDate, xs:gYear)
Can also use to find all elements of a specific data-type - e.g., element(*,xs:gYear)
schema-element()
This must be used with an attribute
schema-element(foo) will return all schema-defined <foo> elements
attribute()
Operates just like element() node test
Used without attributes it is the same as select="@*" in XPath 1.0
Used with one attribute it finds attributes by name - e.g., attribute(author) returns all author attributes
Use with 2 attributes to find by attribute name & data type - e.g., attribute(birthDate, xs:gYear)
Can also use to find all attributes that take values of a specific data-type - e.g., attribute(*,xs:gYear)
document-node()
Returns document nodes
document-node(element(foo)) gives you a document node with a single-element child of <foo>
The document node will, of course, also include any comments and/or processing instructions
Note: your processor will throw an error if the so-called document node has more than 1 child element
Other node-retrieval approaches with XPath 2.0
*:NCName
Gives your all nodes with the specified local name
Example:*:fName will give you <student:fName>, <author:fName>, & <fName> elements
Note: re: item()
You see item() in conjunction with variable declarations
This is not a node test
It is, instead, a datatype
Example:<xsl:variable name="foo" as="item()"> defines a variable named "foo" which can contain a single value (though that value can contain any node or atomic type)
Sequences & atomic values [XSLT/XPath 2.0]
Sequences
Conceptually much like arrays
Can hold any kind of items - e.g., nodes or values
Sequences have both an order & a length
With XPath 2.0 functions can
Extract subsets of a sequence
Insert or delete items at a particular point
Example (a variable containing a sequence): <xsl:variable name="months" as="xs:string*"
select="('January', 'February', 'March',
'April', 'May', 'June',
'July', 'August', 'September',
'October', 'November', 'December')" />
Syntax notes:
as attribute defines datatype of variable
The asterisk in "xs:string*" signifies any number of values
Note: when no asterisk is present a variable can contain only a single value
Atomic values
Again, this is an XML concept
When creating a variable with multiple atomic values can use <xsl:sequence> (with its select= attribute) to obtain those values
Can then
Use XPath expressions to obtain values
'Hard code' the desired values
Employ a combination of both
Syntax/example (assuming an xlm file with doc element of rockGroups): <xsl:variable name="rockIdols" as="xs:string*">
<xsl:sequence select="rockGroups/brits/beatles"/>
<xsl:sequence select="('Townshend', 'Daltry', 'Entwistle', 'Moon' )"/>
<xsl:sequence select="('Page', 'Plant', 'Bonham', 'Jones' )"/>
</xsl:variable>
Note: the $rockIdols variable will be a sequence of 12 dead or aging British rock gods
The as attribute
In the $rockIdols example each item is of type xs:string
Had we defined the variable with as="item()*" then:
The Fab Four items would have been element nodes
Thw Who & Led Zeppelin entries would have been strings
When stepping through the items in a sequence can use the instance of() operator to distinguish which items are which
See Tidwell, p. 54, for an example of this
Location paths
The context
General
Context is a critical concept in XPath
Context is akin to the current directory when working in DOS
In DOS, results you get from a dir *.* command will depend entirely on the current directory
XPath expressions will similarly vary based on context (i.e., current location)
The XPath 1.0 context
Context equivalent to "the node in the tree from which any expression is evaluated." (Tidwell, p. 55)
However, there are also environment variables to consider as part of context
Five components to context
Context node
Synonymous with current directory
All XPath expressions are evaluated from the context node
Context position & context size
These integer values are important when processing groups of nodes
Context size is the number of items being processed by a given XPath expression
Context size is the position of the particular item currently being processed within the group
The variables (both names & values) currently in-scope
All the functions currently available to XPath expressions
Functions typically defined by either XPath or XSLT
However, can also create functions within a stylesheet itself
All of the namespace declarations currently in-scope
The XPath 2.0 context
General
Not surprisingly, there are more environment variables (i.e., context components) to consider with XPath 2.0
Best Practice: avoid using relative paths when declaring CURIEs
However, once an RDFa processor concatenates an "expanded" CURIE with the resource, processor will attempt to resolve any relative URIs by using the base declaration
RDFa Overview
RDF vs. RDFa
RDF apparently requires that attributes and values be contained within element tags
RDFa is designed to re-use existing markup as much as possible
Add markup to specify properties
However, can allow existing element content to be used as property values
This reduces mark-up - it also reduces chances for error
RDFa & Vocabularies
Unlike Microformat, vocabularies are not defined in the markup
References to external vocabularies enhances standardization
Common Vocabularies
bibo: http://purl.org/ontology/bibo/
cc: http://creativecommons.org/ns#
dbp: http://dbpedia.org/property/
dbr: http://dbpedia.org/resource/
dc: http://purl.org/dc/elements/1.1/
dcterms: http://purl.org/dc/terms/
foaf: http://xmlns.com/foaf/0.1/
rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns#
rdfs: http://www.w3.org/2000/01/rdf-schema#
taxo: http://purl.org/rss/1.0/modules/taxonomy/
xhv: http://www.w3.org/1999/xhtml/vocab#
xsd: http://www.w3.org/2001/XMLSchema#
RDFa DTD & Related Specifications
XML Declaration - recommended
Doctype Declaration: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN"
"http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd">
Namespace Declaration(s):
Often placed within html tag
Example (using RDFa & 2 other namespace declarations): <html xmlns="http://www.w3.org/1999/xhtml"
version="XHTML+RDFa 1.0" xml:lang="en"
xmlns:foaf="http://xmlns.com/foaf/0.1/"
xmlns:dc="http://purl.org/dc/elements/1.1/">
Media-Type Declaration
Mark-up should include the following meta element: <meta href="content-type" content="application/xhtml+xml">
Character sets
If a specified in the (top-of-file) xml declaration, charset definition should be repeated in the media-type declaration
Example (using the UTF-8 character set): <meta href="content-type" content="application/xhtml+xml; charset=UTF-8">
RDFa Attributes
Existing (i.e., HTML 4.0/XHTML 1.0) Attributes (& their values)
Assuming the following namespace declarations in parent/ancestor element(s)… xmlns:biblio="http://example.org/"
xmlns:dc="http://purl.org/dc/elements/1.1/"
Would write: <apan about="urn:ISBN:1234567890" typeof="biblio:book" property="dc:title">Some Book Title</apan>
Attributes & Triples
Subjects - usually referenced using the about attribute
Predicates - usually referenced via the following attributes:
property
rel
rev
Objects - usually referenced via the following attributes:
When using a URI to define/set the object:
href
resource
source
When using a text to define/set the object:
content
Or, simply, the content of the containing element in question
"RDFa Primer",
B. Adida, M. Birbeck,
Editors. World Wide Web Consortium (W3C),
14 October 2008. This version of the Working Group Note is available at
http://www.w3.org/TR/2008/NOTE-xhtml-rdfa-primer-20081014/. The latest version of this Primer is always available at http://www.w3.org/TR/xhtml-rdfa-primer/.
"RDFa in XHTML: Syntax and Processing.",
B. Adida, M. Birbeck, S. McCarron, S. Pemberton,
Editors. World Wide Web Consortium (W3C),
14 October 2008. This version of the RDFa in XHTML: Syntax and Processing Recommendation is available at http://www.w3.org/TR/2008/REC-rdfa-syntax-20081014/. The latest version
of this Recommendation is always available at http://www.w3.org/TR/rdfa-syntax.
Creator - can be a person, organization, or service
Date
Can be point of time or a time period
Best Practice: use an encoding scheme, such as the W3CDTF profile of ISO 8601: http://www.w3.org/TR/NOTE-datetime>
Description - can be an abstract, a table of contents, a graphical representation, or free-text
Format
The file format, physical medium, or dimensions (size or duration) of the resource.
Best Practice: use a controlled vocabulary, such as the list of Internet Media Types [MIME]
Identifier
Should be unambiguous within context
Best Practice: Identify the resource by means of a string conforming to a formal identification system.
Language
Publisher
Relation - a related resource
Rights - typically a statement about various property (including intellectual) rights associated with the resource
Source - a related resource from which the described resource is derived (in whole or in part)
Subject - the topic of the resource
Typically use keywords, key phrases, or classification codes
Best Practice: use a controlled vocabulary
Title
Type
The nature or genre of the resource
Best Practice: Use a controlled vocabulary such as the DCMI Type Vocabulary [DCMITYPE]
Note: URI for each term is either:
http://purl.org/dc/terms/ELEMENT_NAME for dcterms: (i.e., current) namespace
http://purl.org/dc/elements/1.1/ELEMENT_NAME for dc: (i.e., legacy) namespace
DCMI Metadata Terms [DCMI-TERMS]
The 'master' (i.e., larget) set of vocabularies and technical specifications
Includes:
Sets of resource classes, including the DCMI Type Vocabulary [DCMI-TYPE]
Vocabulary & syntax encoding schemes
Terms in DCMI vocabularies are intended to be used in combination with terms from other, compatible vocabularies in the context of application profiles
Use of DCMI vocabularies (and others) to be done on the basis of the DCMI Abstract Model [DCAM].
"FOAF Vocabulary Specification 0.97" 3rd Ed.,
D. Brickley, L. Miller,
1 January 2010.
This version (rdf, wiki) of the Namespace Document is available at http://xmlns.com/foaf/spec/20100101.html. The latest version is always available at http://xmlns.com/foaf/spec/.
Parent/child model (each child has one & only one parent)
Table is not a phrase used in hierarchical databases
Advantages
Can be extremely fast processing certain types of queries
Handles one-to-many relationships well
Limitations
Must have intimate knowledge of database structure to write queries
Must begin with a pointer to the root before drilling down to find an object
Structure itself is very inflexible
Can often be quite redundant
Cannot enter parent-less children
Cannot easily handle many-to-many relationships, or multiple-parent relationships
Queries are very complicated to write - generally requires programming
Network Databases
Similar to hierarchical databases, but with multiple parents allowed
Relationships among objects are called sets
Advantage vs. hierarchical database is that you do not have to begin at a root in order to navigate to a data element
Limitations
Very inflexible
Changes to the database structure often require rebuilding the entire database
As with hierarchical database, queries are very complicated to write
Object-Oriented Databases (these have receded in popularity)
Online Analytic Processing (OLAP)
Transforms data into a multidimensional object
Used for advanced analysis
Generally an add-on to RDBMS
XML
Relational Databases, Set Theory & Predicate Logic
Set Theory
A set is any collection (M) of definite & distinct objects (m), which themselves are called elements
The set itself is a single entity
Note: If the elements are not distinct then you are dealing with a multiset, or a bag
Tables in a database are supposed to represent a set
Rows in a table must therefore be distinct (enforced via key constraints)
The definition of a set is rather subjective, though, again, it should be some real-world or logical entity
Defining elements
Done so strictly via their attributes
Order of elements in the set is completely irrelevant
Predicate Logic
A predicate is a property or expression that either holds or not - i.e., it is either true or false
Predicates used in relational model to:
Maintain the logical integrity of the data
Define its structure
Sometimes you define sets simply by enumerating the members - e.g. Tom, Dick, & Harry
Often, though, define a set by defining a property - e.g., all prime numbers
Example: in a table of employees might enforce a predicate "salary greater than zero"
Also use predicates when formulating queries - e.g., salary greater than $50,000
The Relational Model
General
Combination of set theory and predicate logic
Mathematical underpinning of RDMSs
Keys
Candidate key
"…a set of one or more columns that has a unique value for every row in this table." (Kriegel and Trukhnov, p. 17)
Essentially by definition, a candidate key contains non-null values
Note: A table can have any number of candidate keys
Primary key
A particular case of a candidate key; the candidate key chosen to identify each record uniquely
A table can have only one primary key
Technically you are not required to have a primary key, though it is always a good idea to do so
Composite key - a key composed of more than one column
Foreign key - a field which contains entries from the primary key of a different table
Propositions, predicates, & relations
Relations
Relation = a representation of a set in set theory
In the relational model relation is "a set of related information, with the implementation of the database being a table." (Ben-Gan, p. 5)
A single relation/table should represent a single set
Note: a join between two or more tables produces a single relation
Attributes
The heading of a relation comprises a set of attributes
Note: Because those attributes are themselves a set their order does not matter
Two components of an attribute
Attribute name
Domain - i.e., type, or range of possible values
Example: for phone numbers would set domain to 10-digit entries that do not begin with 0 or 1 (in reality there are likely other constraints)
Note: can also use custom enumerations, such as, for example, Full-time, Part-time, or Temp
"A domain or type is one of the simplest forms of a predicate in our database because it restricts the attribute values that are allowed." (Ben‑Gan, p. 6)
Propositions
"A proposition is an assertion or statement that must be true or false." (Ben-Gan, p. 5)
To define a relation you create predicates out of propositions - i.e.,
This defines the structure of a relation
Basically, you decide on/define the attributes needed to define a relation
Missing values
Two-valued predicate logic: a proposition is either true or false
Three-valued predicate logic: true, false, or unknown
In SQL you flag unknown values with NULL
Constraints
Domain integrity
Selecting the type of an attribute
Also specifying whether it accepts NULL values
Many other constraints can be enforced
Candidate keys to enforce entity integrity
Foreign keys to enforce referential integrity
Minimum, maximums, other options
Normalization
"Normalization is a formal mathematical process to guarantee that each entity will be represented by a single relation." (Ben-Gan, p. 7)
1st normal form (1NF)
Two components:
Rows in the table must be unique
Attributes should be atomic
Note: this is essentially a tautology (i.e., it defines a set/relation)
Implementation
Uniqueness of rows achieved by defining a unique key in the table
"Atomicity of attributes is subjective in the same way that the definition of a set is subjective." (Ben-Gan, p. 8)
When storing names, for instance, subjective decision as to whether you need to store first & last names as separate fields
Sub-atomic attributes: having an address attribute that doesn't call for including city & state
Arrays
These do not ipso facto violate the 1st normal form
Example: a Yearly Sales relation with the attributes qty2009, qty2010, & qty2011 is simply a relation of 3 years of data
2nd normal form (2NF)
Rule 1: Data must meet 1NF
Rule 2: "For every candidate key, every nonkey attribute has to be fully functionally dependent on the entire candidate key." (Ben-Gan, p. 8)
In other words, need each candidate key value to obtain a nonkey attribute value
Conversely, if you can obtain a nonkey attribute value given just part of a candidate key then you have violated the 2NF
Separates syntax items enclosed in brackets or braces
You can use only one of the items
[ ] - optional items
[,…n] - brackets
Indicates the preceding item can be repeated n number of times
The occurrences are separated by commas
[…n] - brackets
Indicates the preceding item can be repeated n number of times
The occurrences are separated by blanks
{ }
Required items
Sometimes also used to group items so that they can be marked with symbols such as [ ], ¦, etc.
* - Identifies items that can be repeated 0 or more times
( ) . , - punctuation that must be used
SQL Data Types
General
Must specify data type for each column in a table
SQL:2003 supports three categories of data types:
Predefined
Constructed
User-defined
Predefined data types
Strings
Character strings
General
Character sets
A predefined sequence of characters
"A 'character set' could be thought of as a table that assigns a unique binary number … to each character that belongs to the character set." (Kriegel & Trukhnov: 2nd Ed., p. 47)
ASCII is the simplest, & most common - 1 byte/character
The set of rules that govern comparisons between characters in a set
Typically defines 4 items
Language support - e.g., Latin1_General
Dictionary sorting
Normal: A & a < B & b
Binary: A < B < a < b
If a collation uses binary sorting it will contain BIN in its name
Case sensitivity
CI in name indicates "case insensitive"
If case insensitive then sorting regards a = A
Accent sensitivity
AS in name indicates "accent insensitive"
If accent sensitive then à <> ä
Levels of specification
For on-premise SQL Server
Instance - i.e., the entire server
Database
Column
Expression
For Azure SQL
Database
Column
Expression
Setting database collation
Use COLLATE clause to override the instance collation
Sets the default column collation
Sets collation for metatdata objects
If case insensitive then you cannot have two tables T1 & t1`
Note: collation for variables & parameters controlled by instance collation, not by database collation
Column & expression collation
Use COLLATE clause to define column collation
Instance collation
Case-insensitive example: -- The following will return a record for 'John Doe'
SELECT lastname
FROM HR.Employee
WHERE lastname = 'doe';
Overriding case-insensitive db collation example: -- The following will not return a record for 'John Doe'
SELECT lastname
FROM HR.Employee
WHERE lastname COLLATE Latin1_General_CS_AS = 'doe';
Strings of zero length
Called an empty string
SLQ:2003 specifically differentiates between empty strings & NULL
However, this is not the case with all RDBMS vendors!
Size
ASCII characters occupy 1 byte (8 bits)
Characters from other character sets might be larger
Fixed-length character strings
The system will allocate the requisite number of bytes in memory
Blank characters are padded to the end of any entries that are shorter than the specified length of the column
(Ergo, all data is made to be the fixed-length)
Use when you expect values to be roughly the same size
Syntax (2 variations):
CHARACTER(n), where n specifies the length of the string
CHAR(n)
If size is omitted the default is 1
Character strings of varying length
Must specify maximum length
System allocates memory/disk space dynamically
Use when you expect varying sizes
Syntax (3 variations):
CHARACTER VARYING(n), where n specifies maximum length
CHAR VARYING(n)
VARCHAR(n)
National character strings
ASCII (1 byte/character) allows for 256 different characters
Unicode
Standard, double-byte (i.e., 4-byte) character set
Allows over 4 billion characters
Encompasses every character across all languages
When working with Unicode there are two types, which mimic ASCII data (i.e., fixed length and varying length)
Note: With Unicode, if you specify a fixed length of 13 characters the field will occupy 52 bytes (13 X 4)
Fixed-Length Syntax (4 variations):
NATIONAL CHARACTER(n)
NATIONAL CHAR(n)
NCHAR(n)
CHARACTER[(n)] CHARACTER SET <char_set_name>
Variable-Length Syntax (4 variations):
NATIONAL CHARACTER VARYING(n)
NATIONAL CHAR VARYING(n)
NCHAR VARYING(n)
CHARACTER VARYING(n) CHARACTER SET <char_set_name>
Specify that a string literal should be treated as National Character
Pre-pend the literal with the (capital) letter 'N' to specify that it should be treated as National Character
Note:WHERE lastname = N'Doe'
Large objects
Sometimes need to store data that will exceed a vendor's character string type limit
Examples: Resumes, term papers, etc.
ASCII-data syntax (2 variations):
CHARACTER LARGE OBJECT
CLOB
Unicode syntax:
NATIONAL CHARACTER LARGE OBJECT
NCLOB
Character string literals
Enclosed in single quotation marks - i.e., apostrophes
Use 2 apostrophes to represent a single apostrophe within a string literal
To represent a national character set literal precede the string literal with a capital N
Example: N'Bill White'
Binary strings
General
Used to store binary information
Examples: a document in Word format, audio & video files, program executables
Syntax (2 variations):
BINARY LARGE OBJECT
BLOB
Binary string literals
Can technically define them (SQL Server & Oracle)
Rarely used in practice
Binaries typically accessed and manipulated via special programs & interfaces
Numbers
Exact numbers
General
2 Variations
Whole numbers
Decimals
Characteristics
Positive/negative
Precision - i.e., maximum total number of digits than can be stored (both before & after decimal)
Scale - i.e., maximum number of decimals
Types & syntax
NUMERIC[(p[,s])]
Use for whole numbers and numbers with specific decimal component
Can optionally specify scale
DECIMAL[(p[,s])], or DEC[(p[,s])]
Similar to NUMERIC
However, the precision (but not the scale) used by a vendor-specific implementation can be greater than the precision used in the declaration
INTEGER, or INT
4-byte data
Range: -2.1B to +2.1B
SMALLINT
2-byte data
Range: -32.7K to +32.7K
BIGINT
8-byte data
Range: -9.2 quintillion to +9.2 quintillion
Literals for exact numbers
Precede with a plus (optional) or minus sign
Include a decimal point (optional)
Approximate numbers
General
The number pi, for instance, cannot be specified to exact precision
FLOAT[(p)]
Stores floating point numbers
Specifying precision is optional
REAL
Similar to FLOAT
However, precision is fixed
DOUBLE PRECISION
Very similar to REAL
However, greater precision
Literals for approximate numbers
Basically same as literals for exact numbers
However, use uppercase E or lowercase e
Examples:
+4.56E2
-3.345e1
7.876E-2
Date & time
Complex data types
Any 'single' date/time entry actually contains several bits of information: year, month, etc.
In programming these kinds of data types are called complex & are usually created with the Structure keyword
Date & time implementations
Syntax:
Note (on UTC, or Coordinated Universal Time):
Equivalent to Greenwich mean time (GMT)
Time zones are either positive or negative offsets to UTC
DATE (3 elements):
Year (4 digits)
Month & day (2 digits each)
TIME (3 elements):
Hour (2 digits, from 00 to 24)
Minute (2 digits)
Seconds (2 digits, 00 to 61 - to handle leap seconds)
TIME WITH TIME ZONE - adds UTC plus time zone information
TIMESTAMP[(p)]
A combination of DATE & TIME
Can optionally specify precision
TIMESTAMP[(p)] WITH TIME ZONE
INTERVAL (2 options):
Include YEAR & MONTH fields
Include DAY, HOUR, MINUTE, & SECOND elements
Date & time literals (implementation varies significantly by vendor)
XML data-type implementations
XML data is hierarchical
Implementation varies by vendor
Constructed & user-defined data types (implementation varies by vendor)
Other data types
BOOLEAN
Noteworthy because several major vendors do not implement
See Kriegel & Trukhnov: 2nd Ed., p. 77, for how to simulate
Vendors have introduced a variety of other, proprietary data types, e.g.:
ROWID, UROWID (Oracle)
TIMESTAMP (SQL Server 2008)
SQL_VARIANT (SQL Server 2008)
NULL
SQL:2003 standards state that each data type should contain a NULL value
Any operator that includes NULL as a term must return NULL
This often requires special work-arounds
Example: if you have a calculation with 100 - NULL you probably don't want NULL as your result
CREATE TABLE dbo.Employee
(
-- EF looks for a primary key named 'ID'
ID INT PRIMARY KEY
,firstname VARCHAR(30) NOT NULL
,lastname VARCHAR(30) NOT NULL
,hiredate DATE NOT NULL
,mgrid INT NULL
,ssn VARCHAR(20) NOT NULL
,salary MONEY NOT NULL
,CONSTRAINT U_firstname_lastname
UNIQUE(firstname, lastname)
,CONSTRAINT fk_mgrid FOREIGN KEY (mgrid)
REFERENCES HR.Manager (ID)
ON DELETE CASCADE
ON UPDATE CASCADE
)
Temporary tables
Only the data in a temporary table is transitory; the table definition itself is permanent
Data exists only until user
Commits changes in another table
Logs off
Typically used:
When issuing SQL commands from other programs (i.e., embedded SQL)
You need intermediate results to perform complex calculations
Two types (types differ by data visibility)
GLOBAL - data can be accessed by any program or module within the session
LOCAL
Columns
Column definitions (in SQL:2003 can use domains rather than data types when defining columns)
Column constraints
Columns can have multiple constraints
Options:
NOT NULL (Ummm… NULL values are not permitted)
UNIQUE (Note:NULL values are permitted)
PRIMARY KEY (a combination of NOT NULL & UNIQUE)
REFERENCES (the column is a foreign key to the referenced table)
CHECK
Verifies that column values obey certain rules
Example: only positive numbers, or only certain/enumerated values
Naming constraints
Optional, though RDBMS will generate default name if you do not specify one
Conventions used when naming constraints (prefixes or suffixes, sometimes with 01, 02, etc. appended)
NOT NULL - nn
UNIQUE - uk, u
PRIMARY - pk
REFERENCES - fk
CHECK - chk, c, or words like "UPPER", "POSITIVE", etc.
Column default values (specified via DEFAULT keyword within CREATE TABLE command)
Column collating sequence (allows you to specify non-standard string ordering rules)
Table constraints
General
Very similar to column constraints
Difference is that table constraints can operate on multiple columns
(Single-column constraints can also be declared as table constraints)
Options:
UNIQUE (similar to the column constraint but can be used to ensure the uniqueness of a combination of two or more columns)
PRIMARY
If specifying multiple columns the combination of values must be unique
NULL values are not allowed
FOREIGN KEY (if multiple columns are identified each column must reference a primary key in another table)
CHECK
Syntax:
Similar to syntax for column constraints
However: following the keyword specifying the type of constraint (e.g., PRIMARY ) the column name(s) and/or rules must be enclosed in parentheses.
Example (adapted from Kriegel & Trukhnov: 2nd Ed., p. 667): CREATE TABLE Address(
addrid INTEGER NOT NULL,
address VARCHAR(60),
type VARCHAR(8),
city VARCHAR(18) NOT NULL,
state CHAR(2),
zip VARCHAR(10) NOT NULL,
CONSTRAINT CHK_Address_type
CHECK (type IN ('BILLING', 'SHIPPING')),
CONSTRAINT PK_Address PRIMARY KEY (addrid)
);
Referential integrity constraints
Can optionally specify how the row in a child table is affected when a referenced row in a parent table is changed or deleted
Syntax: [
ON {DELETE ¦ UPDATE}
{NO ACTION ¦ CASCADE ¦ SET NULL ¦
SET DEFAULT ¦ RESTRICT}
]
Options:
NO ACTION
Default value
Means the user will receive an error message if attempting to delete a row or to update the primary key value of a referenced row
CASCADE
If a referenced row is deleted all child rows will be deleted!
If a referenced row's primary key is updated the foreign keys of child rows will be similarly updated
SET NULL - child row's foreign key values will be set to NULL whenever the primary key of a referenced row is deleted or changed
Child row's foreign key values will be set to NULL whenever the primary key of a referenced row is deleted or changed
Note: you will get an error message when running your CREATE TABLE command if the column with the foreign key is defined as NOT NULL
SET DEFAULT
Foreign key column(s) in a child row are set to their default values whenever a referenced row is deleted or the primary key of a referenced row is changed
Obviously, this assumes that the foreign key columns in a table with child rows have default values
RESTRICT
Similar to NO ACTION, in that an integrity constraint violation error is raised
Difference: NO ACTION checks are made at the end of an SQL statement
Therefore, NO ACTION allows referential constraints to be "temporarily" violated
RESTRICT, however, prohibits any such violations, even when transitory
Deferrable constraints
Constraints can either be DEFERRABLE or NOT DEFERRABLE (default)
NOT DEFERRABLE constraints are checked after every DML statement
DEFERRABLE constraints have 2 options:
Immediate mode: checked after every INSERT, DELETE, or UPDATE statement
Deferred mode: Constraints are checked at the end of the transaction
Deferrable constraints are helpful when:
You might want to load data into a child table before loading data into a parent table
You want to load data which doesn't comply with CHECK constraints and then go in and "fix" that data
ON COMMIT clause
Used only with temporary tables
Specifies whether data are deleted at the end of a transaction or kept through the end of a session
Physical properties clause
Deals with how files are physically stored on disks
General idea:
Separate database objects by type
Spread objects across available disks to speed up database operations
As a rule, you would put table data on one disk, table indexes on other, etc.
Implementation varies by vendor
Identity clause
Used to generate unique, sequential values
Generally used for a primary key
Syntax: [
[START WITH <integer>]
[INCREMENT BY <integer>]
[MINVALUE <integer>]
[MAXVALUE <integer>]
[CYCLE ¦ NO CYCLE]
]
Creating a new table as a copy of another table
Can create from 1 table or from multiple tables
Can create new column names or keep existing ones
Syntax: CREATE TABLE <table_name>
[(
<col1_name>,
<col1_name>,
…
)]
AS
(
SELECT …
FROM …
WHERE …
)
[WITH NO DATA]
Indexes
General
Physical files; exist on a hard drive
Indexes speed data retrieval
Generally one does not explicitly refer to indexes within query statements
Essentially 2-column tables
Column 1: values from a column or from a group of columns in referenced table
Column 2: pointer to physical row location on disk
Note: these tables are sorted on "Column 1" values
Clustered indexes
Phsically stores data rows in a table based on key values
There can be only 1 clustered index per table because key values must be unique
Implementation
Typically implemented as B-tree indexes
B-tree algorithm
Designed to minimize number of hard disk reads
Uses a binary tree (i.e., each node has only 2 children)
If the top value in the node tree is greater than the value being sought you go to the "left-hand" child of the node
Otherwise you go to the "right-hand" child of the node
You continue until you find the desired value (or you reach a leaf node)
See Kriegel & Trukhnov: 2nd Ed., Fig. 4‑1, p. 116, for a good, explanatory diagram
Indexes can either be unique or non-unique
Indexes are implicitly unique when created on columns with a PRIMARY KEY or UNIQUE constraint
In fact, indexes on PRIMARY KEY or UNIQUE columns are created by default
Don't create for small tables (fewer than 50 rows)
With large tables, create only when queries on the column(s) are searching for fewer than roughly 15% of rows
Indexes are usually helpful on columns used in table joins
Indexes are useful on columns often used together in the WHERE clause of SELECT statements
Indexes slow down DML operations which involve indexed columns
This is because index often has to be updated & resorted, and this often involves disk read & writes
Ergo, minimize indexes on tables that are frequently updated
CREATE INDEX statements
Syntax/example (nonclustered index): CREATE INDEX myindex ON myschema.sometable(columnfoo);
Syntax/example (clustered index): CREATE CLUSTERED INDEX myindex ON myschema.sometable(columnfoo);
Syntax/example (nonclustered index with unique constraint & sort order): CREATE UNIQUE INDEX myindex ON myschema.sometable
(columnfoo DESC, columnbar ASC);
Views
General
Often described as "virtual tables"
Unlike tables, views do not occupy disk space
"View definitions are stored in RDBMS as compiled queries that dynamically populate data to be used as virtual tables for users' requests." (Kriegel & Trukhnov: 2nd Ed., p. 120)
Typical uses:
Combining data from multiple tables into a more usable format
Enforce security rules by exposing only horizontal and/or vertical slices of data
If the column_name list is omitted, columns will be named based on the SELECT statement
Columns must be named in either of the following cases:
There would be ambiguity (from 2 or more columns having the same name)
The SELECT statement includes any kind of computed value but has not aliased the resulting column
The SELECT statement & updatable views
Restrictions on embedded SELECT statement (these are quite limited):
Cannot use the ORDER BY clause
The view definition cannot be circular - either directly or indirectly
Updatable views
Updatable views are those which can be used in DML statements
Views can be, but do not have to be, updatable
General rules for a view to be updatable:
One-to-one correspondence between rows of the view & rows of the underlying, physical table
One-to-one correspondence between columns of the view & columns of the underlying, physical table
Note: The converses need not hold, as rows in the base table are often filtered out (& columns omitted) when creating views
Specific restrictions on updatable views:
The query_expression does not contain any table joins - i.e., the view refers to one & only one table or view
If one view is based on another, the underlying view can only refer to 1 table for the outer view to be updatable
All of the underlying table's mandatory columns (i.e., NOT NULL columns) are included in the view definition
The underlying query does not contain any set operations - i.e., UNION, EXCEPT, or INTERSECT
The DISTINCT keyword is not allowed
No aggregate functions or expressions can be included in the query_expression
The underlying query cannot have a GROUP BY clause
View constraints
Not allowed
However, the CHECK OPTION is another way to skin that cat (at least partially)
The CHECK OPTION clause can only be used with updatable views
This requires that any DML statements executed on the view has no effects other than those which are visible though the view
Options:
CASCADED (default) - if a view is based on other views the underlying views are also checked
LOCAL - CHECK OPTION is not run against underlying view(s)
Creating complex views (see Kriegel & Trukhnov: 2nd Ed., pp.127 - 9, for examples)
Aliases & synonyms (not part of SQL:2003 standards)
Schemas
General
Defined in SQL:2003 as a named group of related objects
Useful when you have circular references - i.e., two tables with foreign keys referring to the other table
CREATE SCHEMA statement
Note: You can grant privileges on the objects within a schema when creating the schema
Syntax: CREATE SCHEMA
{<schema_name> ¦
AUTHORIZATION <authorization_id> ¦
<schema_name> AUTHORIZATION <authorization_id>
}
[DEFAULT CHARACTER SET <character_set>]
[<schema_path_specification>]
[<create_object_statement> ¦
<grant_privilege_statement>…]
The AUTHORIZATION <authorization_id> clause is used when the schema creator does not own the objects within the schema
Can obviously create a default character set different from the database default character set
Objects creatable as part of a schema:
Tables
Views
Domains*
Assertions*
Character sets*
Collations*
Translations*
Triggers
Transforms
Schema routines
Sequences
User-defined objects
*Note: though part of SQL:2003 standards, these operations are not in any of the Big Three databases (at least as of 2008, per Kriegel & Trukhnov: 2nd Ed.)
Sequences
General
A database object similar to that of an identity (column)
Difference:
An identity is tied to a table column
A sequence, however, is an independent entity
Multiple users can generate numeric values from a single sequence & use those values for different purposes
Typical use patterns:
To generate primary key values for one or more tables
Tend to be most useful within context of procedural programs
Absent the use of procedural programs one typically creates a one-to-one relationship between a sequence and a table (again, specifically for the table's primary key)
Sequence generators
Internal sequence generators:
Anonymous & created behind the scenes as part of other database objects
Identity columns are associated with internal sequence generators
Note: The scope-related options apply only to reference tables
Essentially, this code allows you to:
Add or delete a table column
Add or delete a default value for a column
Drop a table constraint
Syntax for altering identity columns (the same as for creating one): [
[START WITH <integer>]
[INCREMENT BY <integer>]
[MINVALUE <integer>]
[MAXVALUE <integer>]
[CYCLE ¦ NO CYCLE]
]
DROP TABLE statement
General
Statement removes both table & related indexes on columns from physical storage
The following (where related) are removed from the database data dictionary:
Table definitions
Index definitions
Integrity constraints
Triggers
Views (though, apparently, this is not adhered to by the Big Three)
Any other objects that reference the table
Note: as with any DDL statement, DROP TABLE statements are irreversible
The statement is committed immediately
The statement cannot be rolled back
Syntax: DROP TABLE <tbl_name> [{CASCADE ¦ RESTRICT}]
When specifying the CASCADE option:
All dependent objects (views, constraints, triggers, etc.) on the table will be removed
Any rows in other tables referencing the dropped table will be deleted!
Indexes
General
Generally one doesn't change indexes because they are invisible to users
If and when you do modify an index you often simply drop and recreate it
ALTER INDEX statement (not part of SQL:2003 standards)
DROP INDEX statement
Removes the index from the database information schema
Cannot drop indexes used to implement PRIMARY KEY or UNIQUE constraints
To modify a PRIMARY KEY or UNIQUE index must use ALTER TABLE…DROP CONSTRAINT statement
Views
ALTER VIEW statement - syntax is implementation-specific
DROP VIEW statement (dependent objects become invalid)
Aliases & synonyms (these are not part of SQL:2003 standards)
Schemas
ALTER SCHEMA statement - syntax is implementation-specific
Not part of SQL:2003 standards
Might be part of SQL:2008 standards
DROP VIEW statement (dependent objects become invalid)
Syntax: DROP SCHEMA <schema_name> [CASCADE ¦ RESTRICT]
RESTRICT keyword means the schema has to be empty
CASCADE keyword means all schema objects will be dropped
Sequences
ALTER SEQUENCE syntax: ALTER SEQUENCE <sequence_name>
[START WITH <start_value>]
[INCREMENT BY <increment_value>]
[MAXVALUE <max_value> ¦ NOMAXVALUE]
[MINVALUE <min_value> ¦ NOMINVALUE]
[CYCLE ¦ NOCYCLE]
DROP SEQUENCE statement
Syntax: DROP SEQUENCE <seq_name> [CASCADE ¦ RESTRICT]
RESTRICT option means the sequence will not be dropped if referenced within the database by any
Triggers
Routines
Other implementation-specific objects
Domains
ALTER DOMAIN syntax: ALTER DOMAIN <domain_name>
[SET <default clause> ¦ DROP DEFAULT ]
[ADD <domain_constraint> ¦
DROP_CONSTRAINT <constraint_name>,… ]
DROP DOMAIN syntax: DROP DOMAIN <domain_name> [CASCADE ¦ RESTRICT ]
Character sets
Character sets cannot be modified
DROP CHARACTER SET syntax: DROP CHARACTER SET <char_set_name>
Collations
Collations cannot be modified
DROP COLLATION syntax: DROP COLLATION <coll_name> [CASCADE ; ¦ RESTRICT ]
Syntax (a variation that will work with all vendors): INSERT INTO <table_or_view_name>
[(<column_name1>,…)]
{ {VALUES (<literal> ¦
<expression> ¦
NULL ¦
DEFAULT),… } ¦
{<select_statement>} }
Common INSERT statement clauses
General
Records typically added to a table one at a time with the INSERT statement
Column names
Can be omitted completely if you are inserting to all columns in a table
Including all column names allows you to specify order of the values
Omitted column names will be either:
Populated with a NULL value, or
Populated with a default value
Must include all NOT NULL columns, unless:
The column is populated by default (e.g., AutoNumber columns)
The column has a default value
Note: Failure to adhere to this will of course generate a RDBMS error
Syntax
Syntax/example (inserting values for specified columns): INSERT INTO Product
(
productid,
name,
description,
manufacturerid,
price,
status
)
VALUES
(
123,
'Foo bar',
'Widget',
'456z',
NULL,
DEFAULT
)
Note: As shown, you can also explicitly specify NULL & DEFAULT values
When inserting values for all columns
You can omit column names in this case
However, must make sure values in your VALUES are in the right order
Syntax/example (inserting values from another table): -- Typically done to archive data
-- Tables have identical structures
-- Here archiving shipments older than 180 days
INSERT INTO ShipmentArchive
SELECT *
FROM shipment
WHERE DATEDIFF(day, shipmentdate, SYSDATE) > 180;
INSERT statement & integrity constraints
Some vendors will perform some implicit conversions
Beyond that, though, the INSERT command will fail
In large systems you may try to insert thousands of rows at a time, and a single bad entry will cause the entire batch to fail
Oracle provides a DML error logging feature
With other vendors use procedural SQL to deal with data errors
In SET clause you can include multiple column/value pairs (separating by a comma)
However, can update only one table at a time
Examples
Syntax/example (updating a single column of a single row): UPDATE Product
SET price = 99.99
WHERE productid = 1234
Syntax/example (updating multiple columns): UPDATE Product
SET price = 99.99,
productname = 'Foo Bar II'
WHERE productid = 1234
Syntax/example (updating single column in all rows): -- Simply omit WHERE clause
UPDATE Product
SET price = price*1.05
Updating columns using a single-row subquery
General
Within your SET statement include a SELECT statement to set the value of a column/value pair
That SELECT statement must return no more than 1 row
Syntax/example (deriving the assignment value from another value): UPDATE OrderHeader
SET paytermsid =
(SELECT paytermsid
FROM PaymentTerm
WHERE paytermscode = 'N12345')
WHERE orderheaderid = 998765
A correlated subquery is where a subquery refers to a value from the outer query
Syntax/example: UPDATE OrderHeader
SET paytermsid =
(SELECT paytermid
FROM PaymentTerm as pt
JOIN Customer as c
ON (pt.paytermid = c.paytermid)
WHERE OrderHeader.customerid = c.customerID)
Essentially, you do this by using a JOIN to specify your column value
You JOIN the table you are updating with the tables assigned in the assignment subquery
Note: this effectively overrides the "single-row" rule
UPDATE statement & integrity constraints
Integrity constraints function similarly to using the INSERT statement
Difference arises from the effect on child tables from ON UPDATE CASCADE or ON UPDATE SET NULL constraints
DELETE: Removing data from tables
General
Can delete directly from a table or from an updatable view
ON DELETE referential integrity constraints of course kick in
Syntax: DELETE FROM <table_or_view_name>
[WHERE <predicate>]
Common DELETE statement clauses
General
DELETE statements are very simple because there is not SELECT clause
Obviously, it is critical to take care constructing the WHERE clause
DELETE statement & integrity constraints
Integrity constraints are not as much of a factor in DELETE statements as they are in INSERT or UPDATE statements
For example, PRIMARY KEY, UNIUQE, and NOT NULL constraints are not a concern with DELETE statements
Referential integrity constraints, however, are very much a consideration
Beware of theON DELETE CASCADEoption, as this constraint causes child rows to be deleted when a parent row is deleted!
The ON DELETE NULL option is also very dangerous (this sets the value in a child's foreign key column to NULL)
Syntax/example (using subqueries in a DELETE statement): DELETE FROM OrderHeader
WHERE customerid =
(SELECT c.customerid
FROM Customer as c
WHERE c.name = 'RCP Consulting')
MERGE: combining INSERT, UPDATE, & DELETE in one statement
Syntax (generic): MERGE INTO [<qualifier>.]<target_table>
USING [<qualifier>.]<source_table> ON (<condition>)
WHEN MATCHED THEN
UPDATE SET {<column> = {<expression> ¦ DEFAULT},…}
WHEN NOT MATCHED THEN
INSERT [(<column>,…)] VALUES (<expression> ¦ DEFAULT),…;
TRUNCATE statement
Not part of ANSI SQL
Oracle & MSSS only
Sessions, Transactions & Locks
Sessions
Session = environment in which (among other things) transactions & locks take place
Session begins with connection to a server - i.e., opening a session
Changes to the settings apply only to the existing session
If you open multiple instances of, say, SQLCMD, on a single machine each instance represents its own session
Connection & session management statements:
Statements for connecting & security
CONNECT TO & DISCONNECT
SET CONNECTION (to select from multiple available connections)
SET SESSION AUTHORIZATION (to set session user identifier)
SET ROLE
Statements relating to <preparable statement>s
Conditions:
Prepared in current SQL session by either:
An <execute immediate statement>
A <prepare statement>
And in <direct SQL statement>s that are invoked directly
Statements:
SET CATALOG (for unqualified <schema name>s)
SET SCHEMA (for unqualified <schema qualified name>s)
SET NAMES (for setting default character set name for <character string literal>s)
SET PATH (for setting the path to determine the subject routine of <routine invocation>s with unqualified <routine name>s)
Other
SET SESSION CHARACTERISTICS AS (to set one or more characteristics of the current session)
SET TRANSFORM GROUP (used for mapping values of user-defined types to predetermined data types for a group of transform functions)
SET COLLATION (affects one or more character sets)
Orphaned sessions
Occurs when a client application terminates abruptly
Generally the vendor applications will notice this situation after some specified interval and execute the proper clean-up
Orphaned sessions can also be resolved manually by the DBA
Insures that multi-step operations are processed as a single unit (i.e., the entire transaction)
Transactions continue until either a COMMIT or ROLLBACK statement is issued
Relevant T-SQL statements (see )
BEGIN DISTRIBUTED TRANSACTION (across multiple servers)
Use to work with multiple servers
Beneath the covers you are working with MS DTC
BEGIN TRANSACTION (can set transaction characteristics here)
COMMIT TRANSACTION
COMMIT WORK
ROLLBACK TRANSACTION
ROLLBACK WORK
SET TRANSACTION - set characteristics of next transaction
SAVE TRANSACTION
ACID test for transactions
Atomicity - either all of the changes are made or none are
Consistency - upon either completion or rollback of a transaction all of the data involved must:
Be left in a consistent state
Database integrity cannot be compromised
Isolation
One transaction should not be aware of modifications made by a second transaction (unless and until, of course, that second transaction has been committed)
Note: you can modify this default behavior with differing isolation levels
Durability - once a transaction has been committed its results must remain in place
Implicit and explicit transactions
An implicit transaction is the default - transactions are automatically started with certain SQL statements
An explicit transaction is started with a BEGIN TRANSACTION statement
Transactions COMMIT & ROLLBACK
Savepoints
Adds granularity to transaction processing
Allows you to set a named point within the transaction (which you typically use to indicate that some important milestone in a query has been completed)
If there are ensuing errors you can then rollback to your savepoint (i.e., you do not have rollback all of your processing)
Savepoints are released once an entire transaction has been committed
Savepoint names must be unique within a given transaction (if a duplicate name is used the initial savepoint is destroyed)
Distributed transactions
Defined: transactions that involve more than one database
These are very complex and generally involve two-phase COMMIT syntax
Transaction isolation levels
Isolation levels refer to the ability of a transaction to see outside its own scope (i.e., data modified by another transaction)
Levels:
READ UNCOMMITTED
Lowest isolation level
Permits dirty reads - i.e., the ability to see uncommitted data
Neither issues nor honors locks
READ COMMITTED
Specifies that shared locks will be held while data are read
Dirty reads are not permitted
Does allow phantom reads - i.e., when row numbers change between reads
REPEATABLE READ
No changes are allowed for the data selected by a query
However, phantom rows may appear
SNAPSHOT ISOLATION
More flexible that SERIALIZABLE isolation
All database reads within transaction see a snapshot of db from moment when transaction began
In other words, snapshot = latest committed values
Before commiting transaction, verifies that no other pending updates exist that would conflict
SERIALIZABLE
Highest isolation level
Puts a lock on the whole dataset
No modifications from the outside are allowed until the end of the transaction
Locks
General
Deals with concurrency
Technically, locking is part of neither SQL nor SQL:2007
Concurrency modes
Optimistic
Assumes that more than one transaction working on the same set of data, while possible, is unlikely
Checks for potential conflicts when committing changes
Resolves conflicts by resubmitting changes
Pessimistic
Expects conflicts from the very beginning
Therefore locks all resources that the transaction intends to use
Obviously, pessimistic locking can significantly slow down a database
Locks are used to implement pessimistic transactions
General locking modes
Shared
Exclusive
Exact variations and implementation varies by vendor
Can also lock specific objects - e.g., rows versus tables
Deadlocks
Occurs when:
Two or more sessions are waiting to acquire a lock on a shared resource
Neither session can proceed because a second session has a lock on some other resource required by the first session
RDBMSs usually handle these situations with specific algorithms
As a last resort the DBA can manually resolve deadlocks
Although SELECT clause appears 1st in a query, it is processed after all other clauses except for ORDER BY
Delimiting an object name
When it must be done
Contains embedded spaces or special characters
Starts with a digit
Is a reserved keyword
Delimit with (double) quotation marks
ANSI SQL: (double) quotation marks
T-SQL: either quotation marks or square brackets (typically preferred)
No harm in delimiting all object names, though it clutters the code
Syntax/example:dbo.Employee."Last Name" (could also do "dbo"."Employee"."Last Name", etc.)
Note: Clauses below are described in the logical order in which the db engine processes them
The FROM clause
Selecting from tables & views
In some sense the only thing you can select from is a table
A view, after all, is simply a query of a table
Schemas
Best Practice: always schema-qualify object names - in this case tables - in code
When you don't
Run risk of db engine picking wrong object
Schema has to be resolved one way or another, & if you do not specify schema you incur (minor) cost of forcing db engine to resolve the schema
Using aliases in a FROM clause
When established all other clauses in the overall SELECT query should utilize the alias
Notes:
The aliases used in a FROM clause differ from those established with a CREATE ALIAS statement
An alias need not be used
Table aliases can be utilized throughout the SELECT statement - i.e., in WHERE, GROUP BY, etc.
Syntax/example (and, yes, you can mix as match as follows): SELECT firstname,
Student.lastname,
s.studentid
FROM FaberCollege.Student [AS] s
Typically, though, you only use table aliases in multi-table queries
Using subqueries in a FROM clause (aka, inline views)
The object of your FROM clause can itself be another entire SELECT statement
Syntax/example: SELECT s.firstname,
s.lastname,
s.studentid
FROM (SELECT fname as firstname,
lname as lastname,
ssn as studentid
FROM FaberCollege.Student
WHERE HomeState = 'NJ') AS s
Note:Inline views, as opposed to VIEWS, exist only for the duration of the query
If you alias a column name within an inline view then an outer query which references that column can only do so by using that alias
The WHERE clause
General
You are narrowing the number of rows you to look at
Often referred to as setting horizontal limits
Relies on predicate logic
That is, the criteria must all evaluate to True, False, or Unknown
Only records evaluating to True are accepted
Operators used
Comparison operators - i.e., =, >, <,…
Compound operators - i.e., AND & OR
BETWEEN
Syntax:BETWEEN x AND y
BETWEEN clause is inclusive
IN (i.e., set membership test)
Essentially, you use this to replace numerous OR operators
Must include the set inside parentheses
Can use a subquery to define your IN set
Note: because the contents of an IN are being compared to a single column all members of the IN set must be of the same data type
The NOT operator
Indexes
Db engines typically look to use indexes to access/filter the data
Generally this gives much faster results than doing a full table scan
The IS NULL operator
Remember that comparison operators tend to choke on NULL values
To check for NULL or non-NULL values the syntax is:
WHERE xyz IS NULL
WHERE xyz IS NOT NULL
How to throw an error with NULL!
WHERE xyz = NULL
WHERE xyz NOT IS NULL
The COALESCE vs. ISNULL functions [T-SQL]
General
COALESCE
Returns the first non-null expression among its arguments
Accepts more than 2 inputs
Is a standard ANSI SQL function
ISNULL
Replaces NULL with a value you specify
Can only handle a single NULL value
T-SQL-specific
Typed vs. untyped NULL
The NULL literal is where you have keyed in "NULL" as an input
Example #1: (using the untyped NULL literal) INSERT INTO Employee
(
firstname,
lastname,
suffix
)
VALUES
(
'William',
'White',
NULL
)
Example #2: (using the untyped NULL literal) --NOTE: this will generate an error.
SELECT COALESCE(NULL, NULL)
The typedNULL literal is where you have specified the data type for your "NULL" input
Example #1: (using a typed NULL literal) -- Snippet
DECLARE
@x AS INT = NULL,
@y AS INT = 1,
@z AS INT = 2
Example #2: (using a typed NULL literal) -- Will return NULL as an integer.
SELECT COALESCE(CAST(NULL AS INT), NULL);
Data type precedence
When an operator works with different data types it first converts data type of lower precedence into that of the parameter with the higher precedence
An error is returned if this conversion cannot be done implicitly
Once the conversion is made the result of the operation will be of the same data type
List - highest to lowest
user-defined data types
sql_variant
xml
datetimeoffset
datetime2
datetime
smalldatetime
date
time
float
real
decimal
money
small money
bigint
int
smallint
tinyint
bit
ntext
text
image
timestamp
uniqueidentifier
nvarchar (including nvarchar(max))
nchar
varchar (including varchar(max))
char
varbinary (including varbinary(max))
binary
Data type of returned expression
COALESCE
Corresponds to input with highest data type precedence
If all inputs are untyped NULLs you get an error
ISNULL
Data type of the first input
If the first input is an untyped NULL the data type of the second input is returned
If both inputs are an untyped NULL an INT NULL is returned
Examples (data types of the same family): DECLARE
@x as VARCHAR(3) = NULL,
@y as VARCHAR(5) = '12345';
SELECT
COALESCE(@x, @y) AS COALESCExy,
COALESCE(@y, @x) AS COALESCEyx,
ISNULL(@x, @y) AS ISNULLxy,
ISNULL(@y, @x) AS ISNULLyx;
-- NOTE: VARCHAR(5) has a higher precedence than VARCHAR(3)
-- Results
-- Both COALESCE expressions return the varchar '12345'
-- ISNULLxy returns varchar '123' - i.e., truncated to VARCHAR(3)
-- ISNULLyx returns varchar '12345'
Results when data types are from different families
Example #1: -- INT has a higher precedence than a character string
-- This will fail, because char string cannot be
-- implicitly converted to an INT
SELECT COALESCE('abc', 1);
Example #2: -- This will return 'abc'
-- ISNULL uses data type of the first input
SELECT ISNULL('abc',1)
Example #3: -- Will generate an error
-- At least 1 argument must be other than untyped NULL
SELECT COALESCE(NULL, NULL);
Example #4: -- Will generate NULL cast as an INT
SELECT ISNULL(NULL, NULL);
Nullability of expression
This is a consideration when executing a SELECT INTO statement
SELECT INTO creates a new table
Determining the nullability of an input
A numeric or string argument is non-nullable
A column reference (name) is considered nullable
Determining whether columns created with SELECT INTO are nullable
With COALESCE
If all expressions are non-nullable then the column will be defined as NOT NULL
Otherwise, the created column is nullable
With ISNULL
If one of the two arguments is non-nullable then the resulting column will be non-nullable
Otherwise, the created column is nullable
Use with subqueries
Wherever possible go with ISNULL vs. COALESCE
ISNULL is more efficient in a subquery
You can in fact get some funky results using COALESCE
Under the hood of COALESCE
Per standard SQL, COALESCE is translated into a CASE statement
Specifically: COALESCE(arg1, arg2)
-- Becomes:
CASE
WHEN arg1 IS NOT NULL THEN arg1
ELSE arg2
END
Clustered index scan
Will occur twice with COALESCE
1st read: to see if the result is NULL
2nd read: to obtain the non-null result
With ISNULL will only occur once
Note: for each index scan there are then a number of logical reads
Atomicity vs. Isolation
You can get strange results with COALESCE
Seems to be a special case of a table being dropped by another user whilst you run INSERT INTO
Behavior seems to violate atomicity of SELECT statement
This general approach is an alternative to using an inner join
Subqueries execute before the calling queries
Types of subqueries
Scalar - your subquery returns a single value
Syntax/example (adapted from Kriegel & Trukhnov: 2nd Ed., p. 281): SELECT ordernumber,
orderdate
FROM Sales.Order
WHERE customerid =
(SELECT customerid
FROM Sales.Customer
WHERE customername = 'RCP Consulting, LLC')
Note: a query like this fails if the subquery returns multiple values
Non-scalar subqueries
Syntax/example (note use of IN keyword): SELECT phonenumber,
phonetype
FROM HR.Phone
WHERE salesmanid IN
(SELECT salesmanid
FROM Sales.Salesman
WHERE salesmancode BETWEEN '07' AND '10')
ANY & ALL keywords
Can use when the outer query is performing a comparison to multiple values returned by a subquery
Assume a view which contains order totals from a single company
Syntax/example (adapted from Kriegel & Trukhnov: 2nd Ed., pp. 282‑3): -- "> ANY" effectively "> MIN"
SELECT v.customername,
v.totalprice
FROM vwCustTotals v
WHERE v.totalprice >
ANY (SELECT totalprice
FROM vwCustTotalsAcme)
ORDER BY totalprice ASC
Syntax/example (adapted from Kriegel & Trukhnov: 2nd Ed., pp. 283‑4): -- "> ALL" effectively "> MAX"
SELECT v.customername,
v.totalprice
FROM vwCustTotals v
WHERE v.totalprice >
ALL (SELECT totalprice
FROM vwCustTotalsAcme)
ORDER BY totalprice ASC
Syntax: Insert ANY or ALL after the comparison operator but before the subquery
Another way to skin this cat is to omit ANY and/or ALL and instead use MAX & MIN functions within the subquery
Nested subqueries
Quite do-able
However, these tend to be resource-intensive, so use judiciously
Indexes
Utilized by the WHERE clause
Reminder: with 3-value predicate logic only logical expressions returning True are returned
The GROUP BY clause
GROUP BY is typically used in conjunction with aggregate functions (e.g., SUM, AVG, etc.)
When specifying > 1 field in the GROUP BY clause you ultimately produce 1 row for each unique combination of those fields
Example:
You have employees from 5 states working at your company
Your employees are born across 15 separate years
The following query will return up to 75 rows (5 X 15): SELECT
state,
YEAR(bdate) AS "Birth Year",
COUNT(*) AS "Nmbr Employees"
FROM HR.Employee
GROUP BY state, YEAR(bdate)
Subsequent clauses - e.g., HAVING, SELECT, & ORDER BY
These will/must operate on the grouped results, not on individual rows within the queried table
This in turn means: any expression used within those subsequent clauses must return only a scalar - i.e., single value - per group
Implications
Fields & expressions used in the GROUP BY clause can automatically be used in subsequent SQL phrases
Rationale: the GROUP BY clause has already guaranteed uniqueness on those fields and/or expressions
Note: this is why we could use e.state & YEAR(e.bdate) in SELECT phrase of above query
Handling fields not included in the GROUP BY clause
Such fields can only appear as inputs to aggregate functions such as COUNT, SUM, AVG, MIN, etc.
This works because such aggregate functions are scalar
Aggregate functions
You can use DISTINCT inside all aggregate functions
NULL values
Ignored by all aggregate functions exceptCOUNT(*)
COUNT(*) ignores NULL values because it is giving you a count of the rows
However, a command such as COUNT(phonenumber) ignores NULLs in the phonenumber column
The HAVING clause
HAVING clause is only used in conjunction with GROUP BY clause
Works on results after they have been grouped
As with WHERE clause relies on predicate logic
Statements must evaluate to True, False, or Unknown
Only groups evaluating to True are returned
Because you are working on groups you can employ aggregate functions
Syntax/example (adapted from Ben-Gan, 3rd Ed., p. 36): SELECT empid, YEAR(orderdate) as orderyear
FROM Sales.Order
WHERE custid = 123
GROUP BY empid, YEAR(orderdate)
HAVING COUNT(*) > 1
Note: the column used in the HAVING clause does not have to appear in the SELECT clause
The SELECT clause
Multi-column SELECT statements
Can cherry-pick columns to SELECT
This query returns a set of sets; each column is an individual set of values
You can, of course, include a column more than once in a query
Selecting all columns
Syntax:SELECT *
Columns are returned in the order in which they reside in the underlying table
Syntax to an select all columns, & then add/repeat a column: SELECT *, columntorepeat
Selecting distinct values
Syntax/example: SELECT DISTINCT YEAR(bdate),
residencestate
FROM Employee
Note:DISTINCT refers to entire row, not just to a single column
You can include "constant" values that appear in every row of your output
You create/define these values as literals, results of functions, etc.
These are simply included as though they were columns in regular SELECT statements
Dummy tables
Oracle
Oracle requires use of the FROM clause in SELECT statements
Dummy tables used when a value does not exist until the moment you call it (i.e., because the value is the result of a function)
When you use the FROM clause SQL requires that you provide something as the object of that clause
One uses a dummy table for this
Note: dummy tables are not a part of the SQL standard
SQL Server
MSSS does not requires the use of the FROM clause in SELECT statements
Syntax/example: -- This is perfectly legit in SQL Server
SELECT 2 + 2 AS mysimplesum
Ergo, dummy tables are neither needed nor provided in SQL Server
Column aliases
Use whenever you want to:
'Rename' a column - e.g., SELECT empid AS EmployeeID
Provide a heading for a constant value or calculated column - e.g., SELECT YEAR(bdate) AS "Year Born"
Technically, you do not have to provide column aliases for these columns - results will still appear
However, if you do not:
Data obviously a little harder to read/understand
The unnamed columns then cannot be referred to in any outer SELECT statement
Your results can no longer be considered a relation
The AS keyword
Use is optional
Best Practice, however, is always to use it
Makes query clearer
Risk of not habitually using AS keyword: if you miss a comma between field names in SELECT phrase COL2 gets interpreted as alias for COL1, & you will find that hard to spot
Note: because of logical processing order remember that column aliases are not available in the following clauses:
FROM
WHERE
GROUP BY
HAVING
Using subqueries in a SELECT clause
Can be done
Syntax/example (adapted fromKriegel & Trukhnov: 2nd Ed., pp. 271‑2) SELECT productnumber,
price,
(SELECT taxrate
FROM SalesTax
WHERE state = 'NJ') AS njtaxrate,
price *
(SELECT taxrate
FROM SalesTax
WHERE state = 'NJ')/100 as totalsalestax
FROM Product
The ORDER BY clause (& some order-related filtering)
General
ASC (the default) & DESC keywords come after the appropriate column name
NULL value treatment varies by vendor
Can use column numbers rather than column names when using ORDER BY
However, the columns to be ordered must then be included in the SELECT clause
For obvious reasons, this is considered quite a code smell
ORDER BY clause is often used in conjunction with GROUP BY clause
Tables, cursors, & relations
In math a set has no order
SQL tables are designed as relations - i.e., mathematical sets - ergo unordered
Using a SELECT statement without an ORDER BY clause
db engine will give you your results in any way it chooses
i.e., the resulting order of rows will be non-deterministic
When you do use an ORDER BY clause
The result is then not a set, ergo not a relation, ergo not a table
With SQL, the result is called a cursor
Significance of working with a table vs a cursor
Some SQL language elements & operations assume they are working on a table
Examples:
Table expressions
Set operators
The SELECT clause & ORDER BY clause
Omitting column(s) in the ORDER BY clause from the SELECT clause
Perfectly legal SQL code
However, you then cannot proof-read your results
Including DISTINCT in your SELECT clause
ORDER BY clause can then only use columns included in the SELECT clause
Any column not the SELECT clause could well have a range of values for each displayed row
ORDER BY can of course not sort rows if a column on which it is to sort could hold multiple values for a given row in the resulting cursor
TOP & OFFSET-FETCH filters
TOP [T-SQL proprietary]
Used within the SELECT clause
Depends on the ORDER BY
However, ORDER BY clause is processed after the SELECT
With the TOP filter T-SQL is then getting a dual use out of ORDER BY
Can specify either the number or a percentage of rows to return
Note:TOP rounds up the number of rows returned if PERCENT yields other than a whole number
When your SELECT clause uses DISTINCT keyword TOP applies to the distinct rows
Ties in ranking
Can make your results non-determinsitic (i.e., more than one result would be considered correct)
In fact, to get really random results it is in fact permissible to use the TOP filter sans an ORDER BY clause
The WITH TIES option
Results will be deterministic
You may then well get more than the number of rows requested
Table or cursor?
That ORDER BY now does 'double duty'
Used to control the order by which data are presented
Also used to control how TOP filtering works
The use of ORDER BY typically indicates that the result will be a cursor
However, it is possible to employ TOP yet still generate (ultimately) a table
It is also possible to order/rank data on one column for the TOP filtering, while displaying data by applying ORDER BY to a different column
To do either of these scenarious requires the use of table expressions (discussed later)
Syntax
Example (from Ben-Gan: 3rd Ed., pp. 44‑5): SELECT TOP (5) [PERCENT] orderid, orderdate, custid, emid
FROM Sales.Orders
ORDER BY orderdate DESC
Example (from Ben-Gan: 3rd Ed., pp. 44‑5): -- Including WITH TIES
SELECT TOP (5) WITH TIES orderid, orderdate, custid, emid
FROM Sales.Orders
ORDER BY orderdate DESC
Example (from Ben-Gan: 3rd Ed., pp. 44‑5): -- To ensure deterministic order include a column with unique values in ORDER BY
SELECT TOP (5) orderid, orderdate, custid, emid
FROM Sales.Orders
ORDER BY orderdate DESC, orderid DESC;
OFFSET-FETCH
Part of U-SQL
Considered an extension of the ORDER BY clause
Can only use OFFSET-FETCH when you have an ORDER BY clause
U-SQL allows this, & is equivatent to OFFSET 0 ROWS
T-SQL requires that an OFFSET clause whenever FETCH is employed
Skipping the FETCH clause gives you all rows after the offset
Keyword flexibility
The ROW & ROWS keywords are interchangeable
FIRST & NEXT are also interchangeable
Example (from Ben-Gan: 3rd Ed., p. 47): SELECT orderid, orderdate, custid, empid
FROM Sales.Order
ORDER BY orderdate, orderid
OFFSET 50 ROWS FETCH NEXT 25 ROWS ONLY;
Window functions (briefly)
Defined
A function that operates operates on some subset of your underlying query for that querey
Returns a scalar result
SQL standard (T-SQL supports a subset of window functions)
Placed within the SELECT clause
Syntax
Begin with a window function, e.g., ROW_NUMBER()
Follow with OVER()
Inside the OVER() clause
This is what returns is the subset of rows on which the window function operates
Specifically, PARTITION BY clause identifies the column that OVER() filters on for the subquery
Can then include an ORDER BY clause
After the OVER() clause a thoughtful data specialist supplies a column alias
Example (from Ben-Gan: 3rd Ed., p. 48): SELECT orderid, custid, val,
ROW_NUMBER() OVER(PARTITION BY custid
ORDER BY val) AS rownum
FROM Sales.OrderValue
ORDER BY custid, val;
This statement will:
Return all orders, with their orderid & value
Sort the rows first by custid, & then within each custid by val
Assign a unique row number, which resets to 1 for each new custid (courtesy of the PARTITION BY clause)
Beyond the SELECT clause
Predicates & operators
Non-standard operators
T-SQL recognizes the following non-standard comparison operators: !=, !>, !<
Ben-Gan (3rd Ed., p. 50) recommends avoiding them
Precedence
Just as multiplication has precendence over addition in arithmatic (e.g., 2 * 5 + 7 = 17), SQL operators have an order of precedence
From highest to lowest:
()
*, /, %
+, - (including + for concatenation)
=, >, <, etc. (i.e., the comparison operators)
NOT
AND
BETWEEN, IN, LIKE, OR
= (when used as assignment)
Data types of scalar expressions
If data types of operands differ the result will be of the data type with the higher precedence
If operands of of the same data type result will be of the same type
Examples:
5 / 2 will yield 2
5 / 2.0 will yield 2.5
Casting example: -- Assume col1 and col2 are INT columns
-- Assume you want to ensure quotient is NUMERIC
CAST(col1 AS NUMERIC(12,2) / CAST(col2 AS NUMERIC(12,2)
CASE expressions
Part of ANSI SQL
An expression, not a statement - i.e, it doesn't take an action
Can be used wherever scalar expressions are allowed
Note: if you omit an ELSE clause SQL imputes ELSE NULL
Simple form
Compares a single value to a list of possible values
Syntax/example: SELECT breakfastid, breadfastname, categoryid,
CASE categoryid
WHEN 1 THEN 'Spam & Ham'
WHEN 2 THEN 'Spam & Egg'
WHEN 3 THEN 'Spam, Ham, & Egg'
WHEN 4 THEN 'Spam, Spam, Spam, & Ham'
WHEN 5 THEN 'Spam, Spam, Spam, & Spam'
ELSE 'Plain Spam'
END AS categoryname
FROM Menu.Breakfast
Searched form
You use predicates (i.e., comparisons) rather than equality in the WHEN clauses
Place the value you are testing inside the WHEN clause
Syntax/example: SELECT breakfastid, breadfastname, price,
CASE
WHEN price < 5.00 THEN 'Ugh. Small Time Customer'
WHEN price BETWEEN 5.00 AND 15.00 THEN 'Ok, pay attention'
WHEN price > 15.0 THEN 'Offer the VIP Lounge'
ELSE 'Missing price'
END AS waitressinstruction
FROM Menu.Breakfast;
Note: You can always convert a simple CASE expression to a searched one (though not visa versa)
'Equivalent' functions
Certain functions are effectively abbreviated CASE expressions
COALESCE() - standard
Non-standard
ISNULL() - T-SQL-specific
IIF() - added to facilitate migration from MS Access dbs
CHOOSE() - added to facilitate migration from MS Access dbs
NULLs
SQL supports 3-value predicate logic:
TRUE
FALSE
UNKNOWN
Logical expressions
When there are no NULLs involved will always give you either TRUE or FALSE
However, whenever there is a missing value - i.e., a NULL - the expression will evaluate to UNKNOWN
UNIQUE column constraint
NULLs in the column are all considered "equal"
Ergo, the column can contain one NULL value, but that is all
Treatment of an UNKNOW result
WHERE & HAVING clauses
These only accept expressions which evaluate to TRUE
Ergo, expressions evaluating to UNKNOWN are filtered out
CHECK (i.e., column & table) constraints
These equate to "reject FALSE"
Ergo, expressions evaluating to UNKNOWN are accepted
Negating UNKOWN
The value remains UNKNOWN
Example (assume an HR record contains NULL in a "salary" field):
The record will be omitted for a WHERE salary > 0 clause
Perhaps counter-intuitively, the record will be also be omitted for a WHERE NOT (salary > 0) clause
Note: To get what you probably want with 2nd clause re-write it as: WHERE salary = 0 OR salary IS NULL
GROUP BY & ORDER BY
NULLs will all appear together
NULLs will appear before present values
All-at-once operations
"…all expressions that appear in the same logical query processing phase are evaluated logically at the same point in time." (Ben-Gan: 3rd Ed., p. 58)
Implication for SELECT clauses
All elements of clause are evaluated simultaneously
Ergo, you cannot create an alias for one column & then refer to that alias elsewhere within the clause
Example: -- This is invalid…
SELECT orderid, YEAR(oderdate) AS orderyear, orderyear + 1 AS next year
FROM Orders;
Implication for other clauses
Suppose you are dividing one column by another, but you want to guard against division by zero
Example: -- This is problematic…
SELECT col1, col2
FROM mytable
-- No guarantee that 1st part of WHERE clause evaluated b4 2nd
WHERE col2 <> 0 AND col1/col2 > 10;
Solution (because CASE statements ARE evaluated sequentially): -- omitting SELECT & FROM
WHERE
CASE
WHEN col2 = 0 THEN 'no'
WHEN col1 / col2 > 10 THEN 'yes'
ELSE 'no'
END = 'yes;'