Design, manage and edit stored procedures! Required reading for professional SQL Server, Visual Basic, Visual InterDev and other enterpise developers. CLICK TO BUY!

Required reading for professional  SQL Server  developers!

© 2000 - Trigon Blue Inc. 
All rights reserved.



Chapter 12

Next Page >

XML Support in SQL Server 2000

Microsoft SQL Server has become a giant among the select group of enterprise-ready Relational Database Management Systems, but as with those other RDBMSs, its roots are in pre-Internet solutions.

The Internet revolution has highlighted a set of old tactical and strategic challenges for the Microsoft SQL Server development team. These challenges include:

  • Storing the large amounts of textual information that Web-based, user-friendly database applications require
  • Delivering that textual (and other) stored information to the Web
  • Sharing information with other departments and organizations that do not use the same RDBMS system

In earlier editions of SQL Server, Microsoft has addressed these issues with such features as Full Text Search, the Web Publishing Wizard, DTS, ADO, and OLE DB. SQL Server 2000 introduces XML compatibility-the new holy grail of the computing industry and the latest attempt to tackle the same old problems.

XML (R)evolution

To communicate with customers in today's rich-content world, you need to provide them with information. Until very recently, such information was inevitably encapsulated in proprietary, document-based formats that are not shared easily. For example, word processor documents are optimized for delivery on paper, and relational databases are often structured and normalized in formats unsuitable to end users.

The first step in the right direction was Standard Generalized Markup Language (SGML). Although it was designed by Charles Goldfarb in the late 1960s, it became the international standard for defining markup languages in 1986 after the creation of the ISO standard. In the late 1980s, companies and government agencies started to adopt this tag-based language. It allowed them to create and manage paper documentation in a way that was easy to share with others.

Then in the 1990s, the Web appeared on the scene and our collective focus shifted from isolated islands of personal computers and local networks to a global network of shared information. SGML's tagged structure would seem to make it a perfect candidate to lead the Internet revolution, but the complexity of SGML makes it difficult to work with and unsuited to Web application design.

Instead of SGML, the developers of the Internet adopted the Hypertext Markup Language (HTML), a simple markup language used to create hypertext documents that are portable from one platform to another. HTML is a simplified subset of SGML. It was defined in 1991 by Tim Berners-Lee as a way to organize, view, and transfer scientific documents across different platforms. It uses HTTP (the Hypertext Transfer Protocol) to transfer information over the Internet. This new markup language was an exciting development and soon found nonscientific applications. Eventually, companies and users started to use it as a platform for e-commerce-the processing of business transactions without the exchange of paper-based business documents.

Unfortunately, HTML has some disadvantages. One of the biggest is a result of its main purpose. HTML is designed to describe only how information should appear-that is, its format. It was not designed to define the syntax (logical structure) or semantics (meaning) of a document. It could make a document readable to a user, but it required that user to interact with the document and interpret it. The computer itself could not parse the document, because the necessary "meta-information" (literally, information about the information) was not included with the document.

Another problem with HTML is that it is not extensible. It is not possible to create new tags. HTML is also a "standard" that exists in multiple versions-and multiple proprietary implementations. Web developers know that they have to test even their static HTML pages in all of the most popular browsers (and often in several versions of each), because each browser (and each version of each browser) implements this "standard" somewhat differently. Different development toolsets support different versions of this standard (and often different features within a single standard).

I want "eXtensible" not "Extensible". It was not a typo. -- Djn
In 1996, a group working under the auspices of the World Wide Web Consortium (W3C) created a new standard tagged language called XML (Extensible Markup Language). It was designed to address some of the problems of HTML and SGML. XML is a standardized document formatting language, a subset of SGML, that enables a publisher to create a single document source that can be viewed, displayed, or printed in a variety of ways. As is the case with HTML, XML is primarily designed for use on the Internet. HTML, however, is designed primarily to address document formatting issues, while XML addresses issues relating to data and object structure. XML provides a standard mechanism for any document builder to define new XML tags within any XML document. Its features lower the barriers for creation of integrated, multiplatform, application-to-application protocols.

(1)Introduction to XML

In today's world, words such as "tag," "markup," "element," "attributes," and "schema" are buzzwords that you can hear anywhere (well, at least in the IT industry), but what do these terms mean in the context of markup languages?

(2)Introduction to Markup Languages

In a broader sense, a markup is anything that you place within a document that provides additional meaning or additional information. For example, in this book we use italic font to emphasize each new phrase or concept that we define or introduce. I have a habit of using a highlighter when I am reading books. Each time I use my highlighter, I change the format of the text as a means of helping me find important segments later.

Markups usually define:

  • Formatting
  • Structure
  • Meaning

A reader has to have an implicit set of rules for placing markups in a document-otherwise those markups are meaningless to him. A markup language is a set of rules that defines:

  • What constitutes a markup

  • What a markup means

(2)Building Blocks of Markup Languages

The syntax of markup languages such as SGML, HTML, and XML is based on tags, elements, and attributes. A tag is a markup language building block that consists of delimiters (angled brackets) and the text between them:

|<TITLE>

An element is a markup language building block that consists of a pair of tags and the content between them:

|<TITLE>SQL Server 2000 Stored Procedure Programming</TITLE>

Each element has an opening tag and a closing tag. The text between these tags is called the content of the element.

An attribute is a building block in the form of a name/value pair that delimits a tag:

|<font size="2">

Okay, let's say that you have created a document and that you have marked up some parts of it. Now what? You can share it with others. They will use something called a user agent to review the document. In a broader context, a user agent could be a travel agent that helps a customer buy tickets for a trip. However, in the IT industry, a user agent is a program that understands the markup language and presents information to an end user. An example of such a program is a Web browser designed to presen
t documents created using HTML.

(2)XML

Let's take a look at a simple example of an XML document:

<Inventory>
  <Asset Inventoryid="5">
   <Equipment>Toshiba Portege 7020CT</Equipment>
   <EquipmentType>Notebook</EquipmentType>
   <LocationId>2</LocationId>
   <StatusId>1</StatusId>
   <LeaseId>1234</LeaseId>
   <LeaseScheduleId>1414</LeaseScheduleId>
   <OwnerId>83749271</OwnerId>
   <Cost>6295.00</Cost>
   <AcquisitionType>Lease</AcquisitionType>
 </Asset>
</Inventory>

(3)Elements

An XML document must contain one or more elements. One of them is not part of any other element and therefore it is called the document's root element. It must be uniquely named. In the preceding example, the root element is named Inventory.

Each element can contain one or more other elements. In the preceding example, the Inventory element contains one Asset element. The Asset element also contains other elements. The Equipment element contains just its content-the text string "Toshiba Portege 7020CT".

Unlike HTML, XML is case sensitive. Therefore, <Asset>, <asset>, and <ASSET> are different tag names.

It is possible to define an empty element. Such elements can be displayed using standard opening and closing tags:

|<Inventory></Inventory>

or using special notation:

|<Inventory/>

If an element contains attributes but no content, an empty element is an efficient way to write it.

| <Asset Inventoryid="5"/>

An element can have more than one attribute. The following example shows an empty element that contains nine attributes:

<asset Inventoryid="12" EquipmentId="1" LocationId="2" StatusId="1" LeaseId="1" LeaseScheduleId="1" OwnerId="1" Lease="100.0000" AcquisitionTypeID="2"/>

You are not allowed to repeat an attribute in the same tag. The following example shows a syntactically incorrect element:

|<Inventory Inventoryid="12" Inventoryid="13"/>


(3)Processing Instructions

An XML document often starts with a tag that is called a processing instruction. For example, the following processing instruction notifies the reader that the document it belongs to is written in XML that complies with version 1.0.

|<?xml version="1.0"?>

A processing instruction has the following format:

|<?name data?>

The name portion identifies the processing instruction to the application that is processing the XML document. Names must start with XML. The data portion that follows is optional. It could be used by the application.

Tip: It is not required but is recommended that you start an XML document with a processing instruction that explicitly identifies that document as an XML document defined using a specified version of the standard.


(3)Document Type Definition 
and Document Type Declaration


We mentioned earlier that markups are meaningless if it is not possible to define rules for:

  • What constitutes a markup
  • What a markup means

A document type definition (DTD) is a type of a document that is often used to define such rules for XML documents. The DTD contains descriptions and constraints (naturally, not Transact-SQL constraints) for each element (such as the order of element attributes and membership). User agents can use theDTD file to verify that an XML document complies with its rules.

The DTD can be an external file that is referenced by an XML document:

|<!DOCTYPE Inventory SYSTEM "Inventory.dtd">

or it can be part of the XML document itself:

<?xml version="1.0"?>
<!DOCTYPE Inventory [
  <!ELEMENT Inventory (Asset+)>
  <!ELEMENT Asset (EquipmentId, LocationId, StatusId, LeaseId,
  LeaseScheduleId, OwnerId, Cost, AcquisitionTypeID)>
  <!ATTLIST Asset Inventoryid CDATA #IMPLIED>
  <!ELEMENT EquipmentId (#PCDATA)>
  <!ELEMENT LocationId (#PCDATA)>
  <!ELEMENT StatusId (#PCDATA)>
  <!ELEMENT LeaseId (#PCDATA)>
  <!ELEMENT LeaseScheduleId (#PCDATA)>
  <!ELEMENT OwnerId (#PCDATA)>
  <!ELEMENT Cost (#PCDATA)>
  <!ELEMENT AcquisitionTypeID (#PCDATA)>
]>
<Inventory>
 <Asset Inventoryid="5">
  <EquipmentId>1</EquipmentId>
  <LocationId>2</LocationId>
  <StatusId>1</StatusId>
  <LeaseId>1</LeaseId>
  <LeaseScheduleId>1</LeaseScheduleId>
  <OwnerId>1</OwnerId>
  <Cost>1295.00</Cost>
  <AcquisitionTypeID>1</AcquisitionTypeID>
 </Asset>
</Inventory>

The DTD document does not have to be stored locally. A reference can include a URL or URI that provides access to the document:

<!DOCTYPE Inventory SYSTEM "http://www.trigonblue.com/dtds/Inventory.dtd">

A universal resource identifier (URI) identifies a persistent resource on the Internet. It is a number or name that is globally unique. A special type of URI is a universal resource locator (URL) that defines a location of a resource on the Internet. A URI is more general because it should find the closest copy of a resource or because it would eliminate problems in finding a resource that was moved from one server to another.

(3)XML Comments and CDATA sections

It is possible to write comments within an XML document. The basic syntax of the comment is:

|<!--commented text-->

Commented text can be any character string that does not contain two consecutive hyphens (--) and that does not end with a hyphen (-). Comments can stretch over more than one line:

<!-- This is a comment. -->
<!--
This is another comment.
-->

Comments cannot be part of any other tag:

<Order <!-- This is an illegal comment. --> OrderId = "123">
...
</Order>


You can use CDATA sections in XML documents to insulate blocks of text from XML parsers. For example, if you are writing an article about XML and you want also to store it in the form of an XML document, you can use CDATA sections to force XML parsers to ignore markups with sample XML code.

The basic syntax of a CDATA sections is:

|<![CDATA[string]]>

The string can be any character string that does not contain "]]>" in sequence. CDATA sections can occur anywhere in an XML document where character data is allowed.

<Example>
 <Text>
   <![CDATA[<Inventory Inventoryid="12"/>]]>
  </Text>
</Example>

(3)Character and Entity References

Like HTML and SGML, XML also includes a simple way to reference characters that do not belong to the ASCII character set. The syntax of a character reference is:

&#NNNNN;
&#xXXXX;

The decimal (NNNNN) or hexadecimal (XXXX) code of the character must be preceded by "&#" or "&#x", respectively, and followed by a semicolon (;).

Entity references are used in XML to insert characters that would cause problems for the XML parser if they were inserted directly into the document. This type of reference is basically a mnemonic alternative to a character reference. There are five basic entity references:

Entity

Meaning

&amp;

&

&apos;

'

&lt;

<

&gt;

>

&quot;

"

Entity references are often used to represent characters with special meaning in XML. In the following example, entity references are used to prevent the XML parser from parsing the content of the <Text> tag:
<Example>
  <Text>
    &lt;Inventory Inventoryid="12"/&gt;
  </Text>
</Example>

 

Structure of XML Documents

XML Documents consists of the three parts that you can see in the following illustration:

Illustration 1

The first part of the document is called the prolog or document type declaration (not document type definition). It is not required. It can contain processing instructions, a DTD, and comments. The body of the document contains the document's elements. The data in these elements is organized into a hierarchy of elements, their attributes, and their content. Sometimes an XML document contains an epilog, an optional part that can hold final comments, or processing instructions, or just white space.

(2)XML Document Quality

There are two levels of document quality in XML:
  • Well-formed documents
  • Valid documents

An XML document is said to be a well-formed document when:

  • There is one and only one root element.

  • All elements that are not empty are marked with start and end tags.

  • The order of the elements is hierarchical: that is, an element A that starts within an element B also ends within element B.

  • Attributes do not occur twice in one element.

  • All entities used have been declared.

    An XML document is said to be a valid document when:

  • The XML document is well-formed.

  • The XML document complies with a specified DTD document.

The concept of a valid document has been imported to XML from SGML. In SGML all documents must be valid. XML is not so strict. It is possible to use an XML document even without a DTD document. If the user agent knows how to use the XML document without the DTD, then the DTD need not even be sent over the Net. It just increases traffic and ties up bandwidth.

(3)XML Schema

DTD is not the only type of document that can store rules for an XML document. Several companies (including Microsoft) have submitted a proposal to W3C for an alternative type of metadata document called an XML schema or XML data. These are the major differences between a DTD and an XML schema:

  • XML schemas support datatypes and range constraints.

  • The language in which XML schemas are written is XML. Developers do not have to read an additional language as they do with DTDs.

  • XML schemas support namespaces (XML entities for defining context).

 

(3)XML-Data Reduced (XDR)

At the time of this writing, the W3C had not yet adopted the XML schema as a standard. Microsoft has implemented a variation of XML schema syntax called XML-Data Reduced (XDR) in the MSXML parser that is delivered as a part of Internet Explorer 5.

Microsoft has promised complete support for the XML schema when the W3C awards it Recommended status, but for the time being, more and more tools and organizations are using XML-Data Reduced. It is also important to note that Microsoft uses XDR in BizTalk, one of the most important initiatives in the Web application market. It is an initiative intended to create e-commerce vocabularies for different vertical markets. SQL Server 2000 also uses XDR for its XML schema.

Let's review an example of an XML schema document:

<Schema name="Schema"
xmlns="urn:schemas-microsoft-com:xml-data"
xmlns:dt="urn:schemas-microsoft-com:datatypes">
<ElementType name="Inventory" content="empty" model="closed">
<AttributeType name="Inventoryid" dt:type="i4"/>
<AttributeType name="EquipmentId" dt:type="i4"/>
<AttributeType name="LocationId" dt:type="i4"/>
<AttributeType name="StatusId" dt:type="ui1"/>
<AttributeType name="LeaseId" dt:type="i4"/>
<AttributeType name="LeaseScheduleId" dt:type="i4"/>
<AttributeType name="OwnerId" dt:type="i4"/>
<AttributeType name="Rent" dt:type="fixed.14.4"/>
<AttributeType name="Lease" dt:type="fixed.14.4"/>
<AttributeType name="Cost" dt:type="fixed.14.4"/>
<AttributeType name="AcquisitionTypeID" dt:type="ui1"/>
<attribute type="Inventoryid"/>
<attribute type="EquipmentId"/>
<attribute type="LocationId"/>
<attribute type="StatusId"/>
<attribute type="LeaseId"/>
<attribute type="LeaseScheduleId"/>
<attribute type="OwnerId"/>
<attribute type="Rent"/>
<attribute type="Lease"/>
<attribute type="Cost"/>
<attribute type="AcquisitionTypeID"/>
</ElementType>
</Schema>

This schema describes the structure of an XML document that contains Inventory information. Only one element in an XML document will be based on this schema. It is defined in the <ElementType> tag. The definition also specifies its name ("Inventory"), content (the tag is "empty" because all information will be carried in attributes), and content model ("closed"-it is not possible to add elements that are not specified in the schema).

The element contains multiple attributes. Each attribute is first defined in an <AttributeType> tag and then instantiated in an <attribute> tag:

  <AttributeType name="Cost" dt:type="fixed.14.4"/>
...
  <attribute type="Cost"/>


For each attribute, the schema defines a name and a datatype. You can see a list of acceptable datatypes later in this chapter.

The following listing shows an XML document that complies with the previous schema:

<Inventory xmlns="x-schema:Schema.xml"
  Inventoryid="5"
  EquipmentId="1"
  LocationId="2"
  StatusId="1"
  LeaseId="1"
  LeaseScheduleId="1"
  OwnerId="1"
  Cost="1295.0000"
  AcquisitionTypeID="1"/>

(3)Schema Constraints

Let's review schema attributes that can be used to declare elements and attributes. These can be classified as:
  • Element constraints
  • Attribute constraints
  • XML datatypes
  • Group constraints

(4)Element Constraints

Elements in a schema can be constrained using attributes of the <ElementType> tag:

  • name

  • content

  • model

  • order

  • group

  • minOccurs

  • maxOccurs
The name attribute defines the name of the subelement.

Possible values for the content attribute are listed in the Table 12-1.

Content

Meaning

"textOnly"

Only text is allowed as content

"eltOnly"

Only other elements are allowed as content

"empty"

No content

"mixed"

Both text and elements are allowed

Table 1: Content Attribute Values

An important innovation in XML schemas (that was not available in DTDs) is the capability to add nondeclared elements and attributes to an XML document. By default, every element of every XML document has its model attribute set to "open". To prevent the addition of nondeclared elements and attributes, the model attribute has to be "closed".

It is also possible to define how many times a subelement can appear in its parent element using the maxOccurs and minOccurs attributes. Positive integer values and "*" (unlimited number) are allowed in the maxOccurs attribute, and "0" and positive integer values are allowed in the minOccurs attribute. The default value for minOccurs is "0". The default value for maxOccurs is "1", except that when the content attribute is "mixed", maxOccurs must be "*".

An order attribute specifies the order and quantity of subelements (see Table 12-2).

Order

Meaning

"seq"

Subelements must appear in the order listed in the schema.

"one"

Only one of subelements listed in the schema can appear in the XML document.

"many"

Any number of subelements can appear in any order.

Table 2: Order Attribute Values

The default value for order is "seq" when the content attribute is set to "eltOnly" and "many" when the content attribute is set to "mixed".

(4)Attribute Constraints

By their nature, attributes are more constrained than elements. For example, attributes do not have subelements (or subattributes), and it is not possible to have more than one instance of an attribute within the element.

The required attribute (constraint) in a schema specifies that the attribute is mandatory in XML documents that follow the schema. The default attribute (constraint) in a schema specifies the default value of the attribute in an XML document (the parser will use that value if an attribute is not present).

The schema can be set so that an attribute value is constrained to a set of predefined values:

<AttributeType name="status"
          dt:type="enumeration"
          dt:values="open in-process completed" />

(4)XML Datatypes

The schema can also enforce the datatype of the attribute or element. Table 2 in Appendix A lists datatypes and their meanings.

Table 3 in Appendix A maps XML datatypes to SQL Server datatypes.

(4)Group Constraints

The group element allows an author to apply certain constraints to a group of subelements. In the following example, only one price (rent, lease, or cost) can be specified for the Inventory element:

<Schema name="Schema" xmlns="urn:schemas-microsoft-com:xml-data"
xmlns:dt="urn:schemas-microsoft-com:datatypes">
<ElementType name="Inventory" content="eltOnly"
model="closed" order="many">
<element type="Inventoryid"/>
    <element type="EquipmentId"/>
    <element type="LocationId"/>
    <element type="StatusId"/>
    <element type="LeaseId"/>
    <element type="LeaseScheduleId"/>
    <element type="OwnerId"/>
      <group order = "one">
       <element type="Rent"/>
       <element type="Lease"/>
       <element type="Cost"/>
      </group>

      <element type="AcquisitionTypeID"/>
</ElementType>
</Schema>

The group constraint accepts order, minOccurs, and maxOccurs attributes.

(3)XML Namespaces

Some entities from different areas of a document can have the same name. For example, you could receive a purchase order document that contains a <name> tag for the customer and a <name> tag for the reseller company. People reading this documents would be able to distinguish them by their context. However, an application would need additional information to correctly interpret the data.

An answer to this problem is to create XML namespaces to provide the XML document with a vocabulary (that is, a context). After that, customer and company names can be referenced using a context prefix:

  <contact:name>Tom Jones</contact:name>
  <Company:name>Trigon Blue</Company:name>

Naturally, before these prefixes can be used, they have to be defined. The root element of the following document contains three attributes. Each of them specifies a namespace and a prefix used to reference it:

<PurcheseOrders
xmlns:contact="http://www.trigonblue.com/schemas/Contact.xml" xmlns:Company="http://www.trigonblue.com/schemas/Company.xml"
xmlns:dsig="http://dsig.org">
  <PurcheseOrder>
    <Customer>
      <contact:name>Tom Jones</contact:name>
    </Customer>
    <PurcheseDate>2000-09-11</PurcheseDate>
    <SalesOrganization>
      <Company:name>Trigon Blue</Company:name>
      <Company:DUNS>817282919</Company:DUNS>
      <Company:ID>1212</Company:ID>
    </SalesOrganization>
    <dsig:digital-signature>78901314</dsig:digital-signature>
  </PurcheseOrder>
</PurcheseOrders>

Next Page >

Home | Products | Services | Book of Knowledge | e-Business News | SQL Server News
About Us | Contact Us | Join Us | Links | Search