Design, manage and edit stored procedures! Required reading for professional SQL Server, Visual Basic, Visual InterDev and other enterpise developers. CLICK TO BUY!

Required reading for professional  SQL Server  developers!

© 2000 - Trigon Blue Inc. 
All rights reserved.



< Previous Page                                                Next Page >

(2)XML Parsers and DOM

Applications (or user agents) that use XML documents can use proprietary procedures to access the data in them. As a rule, such applications use special components called XML parsers. An XML parser is a program or component that loads the XML document into an internal hierarchical structure of nodes (see Figure 12-1) and provides access to the information stored in these nodes to other components or programs.

Figure 1: A possible graphical interpretation of a node tree

The XML Document Object Model (DOM) is a set of standard objects, methods, events, and properties used to access elements of an XML document. DOM is a standard that has received Recommended status from W3C. Different software vendors have created their own implementations of DOM so that you can use it from (almost) any programming language on (almost) any platform.

Microsoft has implemented DOM as a COM component called Microsoft.XMLDOM in msxml.dll. It is delivered, for example, with Internet Explorer 5, or you can download it separately from Microsoft's Web site. Developers can use it from any programming language that can access COM components or ActiveX objects such as Visual Basic, VBScript, Visual J++, Jscript, and Visual C++.

Nevertheless, it is unlikely that you will use it from Transact-SQL. Microsoft has built special tools for development in Transact-SQL. We will review them later in this chapter.

(2)Linking and Querying in XML

XML today represents more than a simple language for encoding documents. W3C is working on a whole other set of specifications for using information in XML documents. Specifications such as XLink, XPath, XPointer, and XQL allow querying, linking, and access to specific parts of an XML document.

This is a vast topic, and we will briefly review only XPointer and XPath, since they are used in SQL Server 2000.

(3)XPointer

XPointer reference works in a fashion very similar to the HTML hyperlink. You can point to a segment of an XML document by appending an XML fragment identifier to the URI of the XML document. A fragment identifier is often enclosed in xpointer(). For example, the following pointer directs the parser to an element with the ID attribute set to "Toshiba" in the document at a specified location:

|http://www.trigonblue.com/xml/Equipment.xml#xpointer(Toshiba)

The character "#" is a fragment specifier. It serves as a delimiter between the URI and the fragment identifier, and it specifies the way that the XML parser will render the target. In the preceding case, the parser renders the whole document to access only a specified fragment. To force the parser to parse only the specified fragment, you should use "|" as a fragment specifier:

|http://www.trigonblue.com/xml/Equipment.xml|xpointer(Toshiba)

Use of the "|" fragment specifier is recommended, as it leads to reduced memory usage.

xpointer() is not always required. If a document has a schema that specifies the ID attribute of an element, you can omit the xpointer() and point to a fragment of the document using only the ID attribute value:

|http://www.trigonblue.com/xml/Equipment.xml#Toshiba

Child sequence fragment identifiers use numbers to specify a fragment:

|http://www.trigonblue.com/xml/Equipment.xml#/2/1/3

The preceding example should be interpreted as follows: "/"-start from the top element of the document; "2"-then go to the second child element of the top element; "1"-then go to the first subelement of that element; "3"-then go to the third subelement of that element.

Child sequence fragment identifiers do not have to start from the top element:

|http://www.trigonblue.com/xml/Equipment.xml#Toshiba/1/3

In the preceding example, fragment identification starts from the element with its ID set to "Toshiba". The parser then finds its first subelement and points to its third subelement.

(3)XPath

The full XPointer syntax is built on the W3C XPath recommendation. XPath was originally built to be used by XPointer and XSLT (a language for transforming XML documents into other XML documents), but it has found application in other standards and technologies. We will see later how it is used by Openxml in SQL Server 2000, but first let's examine its syntax.

Location steps are constructs used to select nodes in an XML document. They have the following syntax:

|axis::node_test[predicate]

The location step points to the location of other nodes from the position of the current node. If a current node is not specified in any way, the location step is based on the root element.

Axes break up the XML document in relation to the current node. You can think of them as a first filter that you apply to an XML document to point to target nodes. Possible axes are listed in Table 12-5.
Axes Description
parent The parent of the current node
ancestor All ancestors (parent, grandparent, etc. to the root) of the current node
child All children of the current node (first generation)
descendant All descendants (children, grandchildren, and so forth) of the current node
self The current node only
descendant-or-self All descendant nodes and the current node
ancestor-or-self All ancestor nodes and the current node
attribute All attributes of the current node
namespace All namespace nodes of the current node
following All nodes after the current node in the XML document. The set does not include attribute nodes, namespace nodes, or ancestors of the context node.
preceding All nodes before the current node in the XML document. The set does not include attribute nodes, namespace nodes, or ancestors of the current node.
following-sibling All siblings (children of the same parent) after the current node in the XML document
preceding-sibling All siblings (children of the same parent) before the current node in the XML document

Table 3: Axes in XPath

The node test is a second filter that you can apply on nodes specified by axes. Table 12-6 list all node tests that can be applied.
Node test Description
element name Selects just node(s) with specified name in the set specified by axes
node()
comment() All comment elements in the set specified by axes
text() All text elements in the set specified by axes
processing-instruction() All processing instructions elements in the set specified by axes (if the name is specified in brackets, the parser will match only processing instructions with specified name)

Table 6: Node Tests in XPath

A predicate is a filter in the form of a Boolean expression that evaluates each node in the set obtained after applying axes and node test filters. Developers have a rich set of functions (string, node set, Boolean, and number), comparative operators (=, !=, <=, >= <, >), Boolean operators (and, or), and operators (+, -, *, div, mod). The list is very long (especially the list of functions), and we will not go into detail here. We will just mention the most common function, position(). It returns the position of the node.

Let's now review how all segments of the location step function together.

|child::Equipment[position()<=10]

This location set first points to child nodes of the current node (root if none is selected). Of all child nodes, only elements named Equipment are left in the set. Finally, each of those nodes is evaluated by position and only the first 10 are specified.

Very often, you will try to navigate from node to node through the XML document. You can attach location sets using the forward slash (/). The same character is often used at the beginning of the expression to establish the current node.

In the following example, the parser is pointed to the Inventory.xml file, then to its root element, and then to the first child called Equipment, and finally to the first Model node among its children:

Inventory.xml#/child::Equipment[position() = 1]/child::
Model[position() = 1]

It all works in a very similar fashion to the notation of files and folders, and naturally you can write them all together:

http://www.trigonblue.com/xml/Inventory.xml#/child::
Equipment[position() = 1]/child::Model[position() = 1]

XPath constructs are very flexible, but also very complex and laborious to write. To reduce the effort, a number of abbreviations are defined. position() = X can be replaced by X (it is enough to type just the number). Thus, an earlier example can be written as:

|Inventory.xml#/child::Equipment[1]/child::Model[1]

If an axis is not defined, the parser assumes that the child axis was specified. Thus the preceding example could be written as:

|Inventory.xml#/Equipment[1]/Model[1]

attribute:: axis can be abbreviated as "@". Therefore, the following two expressions are equivalent:

Inventory.xml#/child::Equipment[1]/attribute::EquipmentId
Inventory.xml#/child::Equipment[1]/@EquipmentId

A current node can be specified using either self::node() or period (.). The following two expressions are equivalent:

Order.xml#/self::node()/OrderDate
Order.xml#/./OrderDate

A parent node can be specified either by parent::node() or "..". The following two expressions are equivalent:

parent::node()/Order
../Order

/descendant-or-self::node() selects the current node and all descendant nodes. It can be abbreviated with "//". The following two examples select all EquipmentId attributes in the document:

Inventory.xml#/descendant-or-self::node()/@EquipmentId
Inventory.xml#//@EquipmentId

(2)Transforming XML

In many cases in business, information that is already in the form of an XML document needs to be converted to another XML structure. For example, a client of mine is participating in RossetaNet, an e-commerce consortium of IT supply chain organizations that defines standard messages to be sent between partners. Although messages are standardized, each pair of partners can agree to modify their messages slightly to better serve their needs. Such changes are mostly structural-new nodes (fields) can be defined, standard ones can be dropped, a node can change its type from element to attribute, and so on. Instead of generating completely different messages each time (and developing two separate procedures for performing similar tasks), it is preferable to create a simple procedure that will transform a standard XML message into another form.

Another typical situation occurs when an application uses a browser to display an XML document. Although modern browsers such as the latest versions of Internet Explorer are able to display the content of an XML document in the form of a hierarchical tree, this format is not user-friendly. More often, the XML document is transformed into an HTML document and information is organized visually into tables and frames. Such HTML applications usually allow the end user to modify the displayed information interactively (for example, to sort the content of the tables, to display different information in linked tables, or to present data in different formats). Each of these tasks could be performed by modifying the original XML document.

A typical problem with HTML browsers from different vendors is that they are not compatible. Naturally (well, actually, it seems quite unnatural to us), even different versions of the same browser behave differently. Each of them uses a different variation of the HTML standard. However, these differences are not major, and instead of generating a separate XML document for each of them, a Web developer can create a procedure to transform the XML document so that it fits the requirements of the browser currently in use.

You can think of XML as just one type of rendering language. Some systems use other types of rendering languages and appropriate browsers. For example, more and more PDAs and wireless devices such as cellular phones are offering Internet access. They often use a special protocol (Wireless Application Protocol-WAP) that has its own markup language (Wireless Markup Language-WML) based on XML. A Web server offering information should be able to transform the XML document to fulfill the needs of different viewers.

(3)XSL

The eXtensible Stylesheet Language (XSL) addresses the need to transform XML documents from one XML form to another and to transform XML documents to other formats such as HTML and WML. It is based on CSS (Cascading Style Sheets), a language for styling HTML documents. Over time, XSL has been transformed into three other languages:

  • XSLT for transforming XML documents
  • XSLF for rendering
  • XPath for accessing a specific part of an XML document

(3)XSLT

XSLT is a (new) language for transforming XML documents. W3C gave it Recommended status in November 1999. XSLT style sheet files are also well-formed XML documents. These files are processed by XSLT processors. Such a processor can be a separate tool or part of an XML parser (as in the case of MSXML).

At this point we will not go into detail about XSL and XSLT syntax. Such topics are really beyond the scope of this book. You will have to refer to http://www.w3.org/Style/XSL/ and http://www.w3.org/TR/xslt for more information on this topic. However, we will cover the use of XSLT in SQL Server 2000 later in the chapter.

< Previous Page                                                  Next Page >

 

Home | Products | Services | Book of Knowledge | e-Business News | SQL Server News
About Us | Contact Us | Join Us | Links | Search