reading huge xml in ruby

Dealing with huge xml? say of size 400 MB or more than a GB. No worries, Nokogiri::XML::Reader is there for your help. But let me tell you it takes a while before you understand it and start using it. Needs to understand types of elements, and when to read value and and when to read attributes. However I have went over it and tried defining my own module which now simplifies it.

Your large xml doc must be having thousands of records, and you may or may not know the structure of record. This module will help you easily read your records, its elements, something like this below …

hey_my_module
  take_this_xml
  and look_for_these_elements
  let_me_know_if_you_find_any_of_them

…hence

HugeXML.read xml, elements_lookup do |element|
  # => element{ :name, :value, :attributes}
end

so simple right? wanna check it out… here it is  => github.com/amolpujari/reading-huge-xml

 

DB2 Native XML

XML Native Storage

DB2 v9 stores XML data in its Native form which means, if you imagine the XML document is loaded into memory(RAM), which is hierarchical presentation of the XML data, which is nothing but a tree structure or also called as DOM(Document Object Model), the same tree is stored as it is on the disk, which is like parsed XML data. This is called storing XML data in its native form, and so DB2 v9 is called Native XML Database(NXD), it is also been called as pureXML database.

DB2 is not an XML-enabled database where XML data is stored in relational form using LOB/VARCHAR, where XML document is stored as it is in text form or using shredding/decomposition techniques. Well lets not focus on this XML-enabled stuff here.

In DB2 the XML data type name is “XML”.

XML data type is available only with database with codeset ‘Unicode’, and with single partition-ed database only.

CREATE DB test USING CODESET utf-8 TERRITORY US

You can create a table with XML columns as well as with relational data types columns. Tables can contain any combination of XML columns and relational columns. A column of type XML can hold one well-formed XML document for every row of the table. Relational and XML data are stored in different formats that match their respective data models. The relational columns/data are stored in traditional row structures while the XML data is stored in hierarchical structures. The two are closely linked for efficient cross-access

This XML data type has no length associated with it, it can store an XML doc of mix size – 2GB.

create table test( rc1 int, rc2 char(4), info xml)
DB20000I  The SQL command completed successfully.
describe table test
                                Data type                     Column
Column name                     schema    Data type name      Length     Scale Nulls
------------------------------- --------- ------------------- ---------- ----- ------
RC1                             SYSIBM    INTEGER                      4     0 Yes  
RC2                             SYSIBM    CHARACTER                    4     0 Yes  
INFO                            SYSIBM    XML                          0     0 Yes
  3 record(s) selected.

Physical Storage

Though its not needed to be concerned about DB2’s physical storage management for XML, however to help you understand what DB2 is doing behind the scene, lets briefly discuss it.

In above example DB2 will preserve the internal structure of the XML documents, converting its tag names and other information into integer values, doing so helps conserve disk space and also improves the performance of queries that use navigational expressions. DB2 stores this mapping in a table named, SYSIBM.SYSXMLSTRINGS. There are some other meta tables present in DB2 where it stores some meta information.

DB2 automatically splits portion of an XML document if the document size is more than a page size, it splits node of the document tree across multiple database pages as needed. DB2 can split a document (or a sub-tree) of nodes at any level of the document hierarchy as needed. In such cases DB2 automatically generates and maintains a “regions” index to provide an efficient means of tracking the XML tree. How many regions or pages will a document split into? It depends on the size of the document and the page size. Given a fixed-size XML document inserted into an XML column, the larger the page size, the less number of regions and pages. For example, the same XML document takes more regions and pages on a 4K-page size than on an 8K-page size. Fewer regions and pages per document are better for performance. You should choose the page size depending on the size of your XML documents. If performance is your only consideration, the larger page size is better.

When you create a table with XML columns, you can place XML data and indexes in separate table spaces to use different page sizes and separate configuration parameters, such as prefetch size. For instance the non-long data types are in tablespace2. Indexes are in tablespace3. XML types are considered to be long data types. XML will go into tablespace4.

CREATE TABLE test_xml_table(c1 char(5), c2 int,c3 char(7), c4 XML)
 IN tablespace2
 INDEX IN tablespace3
 LONG IN tablespace4

When to store data in XML form and not in relational form

The data is better described in hierarchical form – complexity of the hierarchical data might require more number of relational tables to store it and it can be difficult to map into relational form

The xml schema is constantly changing and evolving – business rules can change the schema, for example if you are considering 10 properties of an User object in your design and tomorrow if you want to add some more properties where you will need to add some more relational tables affecting the relational between existing properties probably

Many properties/attributes of the data are unknown or empty (NULL) – mapping all data into relational form, then having more null values in the database reasons being most of the attributes are not entered into the database, plus the data is complicated and huge, you might require a number of relational tables where most of the tables having null values. To process this data most of the queries include JOINs which slows down the performance if more nulls are there. With well-designed XML document stored, such case would not occure

Little data with highly complex structure – business rules and terminologies can cause to have little data managed using highly complex structures which requires overheads of number of tables to managed, complicated queries, JOINs etc. XML should be used to describe such data

DB2 XML INSERT

XML data can be easily inserted into database either using INSERT statement or by invoking IMPORT command, the same as for relational data.

INSERT INTO test VALUES (1,'D','<?xml version="1.0"?>
<lib>
  <book category="COOKING">
    <title lang="en">Everyday Italian</title>
    <author>Giada De Laurentiis</author>
    <year>2005</year>
    <price>30.00</price>
  </book>
  <book category="CHILDREN">
    <title lang="en">Harry Potter</title>
    <author>J K. Rowling</author>
    <year>2005</year>
    <price>29.99</price>
  </book>
  <book category="WEB">
    <title lang="en">XQuery Kick Start</title>
    <author>James McGovern</author>
    <author>Per Bothner</author>
    <author>Kurt Cagle</author>
    <author>James Linn</author>
    <author>VaidyanathanNagarajan</author>
    <year>2003</year>
    <price>49.99</price>
  </book>
  <book category="WEB">
    <title lang="en">Learning XML</title>
    <author>Erik T. Ray</author>
    <year>2003</year>
    <price>39.95</price>
  </book>
</lib>')

Note if you want to have ‘ in XML document then you need to use the ESCAPE char as ‘ preceding itself

INSERT INTO test VALUES (2,'R','<?xml version="1.0"?>
<lib>
  <book category=''WEB''>
    <title lang="en">Rails Solutions: ''Ruby on Rails Made Easy </title>
    <author>Justin Williams</author>
    <year>2007</year>
    <price>34.99</price>
  </book>
</lib>')

DB2 XML IMPORT

If you already have your big size XML documents/files on your file system, then you don’t need to write INSERT statements like above, IMPORT is there to help you out, you may import one or more XML files at a time using IMPORT. You just need to prepare one ASCII file to do this, this would be a plain text file usually named with extension as .del. Here is an example below, see the contents of c:\data.del

5,,"<XDS FIL='5.xml' />"
6,"EX",
7,"EX","<XDS FIL='7.xml' />"

Note that you must use ” and not ‘ to specify char/varchar or XML data in .del file.
here is the IMPORT command

IMPORT FROM c:\data.del of del xml from c:\ INSERT INTO test
SQL3109N  The utility is beginning to load data from file "c:\data.del".
SQL3110N  The utility has completed processing.  "3" rows were read from the
input file.
SQL3221W  ...Begin COMMIT WORK. Input Record Count = "3".
SQL3222W  ...COMMIT of any database changes was successful.
SQL3149N  "3" rows were processed from the input file.  "3" rows were
successfully inserted into the table.  "0" rows were rejected.
Number of rows read         = 3
Number of rows skipped      = 0
Number of rows inserted     = 3
Number of rows updated      = 0
Number of rows rejected     = 0
Number of rows committed    = 3

If you remember the test table defined/described below, the same has been used to import the XML files, I have also illustrated the examples of inserting null char or null XML data. In above example

XML Data Specifier(XDS) is used in the IMPORT command to describe the XML data. XDS has four attributes

  • FIL – name of the XML file, either absolute or only file name if XML file resides in the same location that of .del file
  • OFF – is the byte offset of the XML data in the file specified in the FIL attribute, the offset starts from zero
  • LEN – is the length of the XML data in the file specified in the FIL attribute
  • SCH – is the fully qualified XML schema name that is used for validating the XML documents

So last option here SCH says that DB2 can validate the XML against a given schema before it inserts the XML data, I havn’t discussed about this feature so far and I would not prefer here on this page reason being XML validation significantly slows down the insert/update operations. This feature should be used when an application accepts XML from end user and stores it in DB2.

DB2 XML Indexes

  • You can define one or more XML-indexes on a single XML column
  • You can index element, attributes, mixed contents
  • XML-indexes are created based on path expressions or XML pattern

Types of XML-indexes

  1. XML regions index – I just talked about this in physical storage of XML above, these are database created indexes, user has no control over them, these are created, maintained and used when required by DB2
  1. XML column path index – The XML column path index is DB2-generated for each XML column created or added to the table. For example, if a table with two XML columns is created, there is one XML regions index, but two XML column path indexes generated by DB2. The XML column path index maps paths to path IDs for each XML column. It is a subset of the paths stored in the global catalog path table and is used to improve index access performance for queries. When an XML document is inserted, every unique path in the XML document is extracted and stored in the XML column path index with a unique path ID.
  1. Index on an XML column – An index on an XML column is an index created over an XML column. This index allows users to enhance the performance of XQuery and SQL/XML. Like other relational indexes, the XML index is created as a B-tree structure and stored in the same place as the relational indexes are stored. You can also define multiple XML indexes in one XML column. Like indexes on relational data, using indexes on an XML column to improve the query performance may have some cost. The performance for INSERT, UPDATE, and DELETE can decrease as the number of indexes defined on XML column increases.

Lets see now how to create XML-indexes

CREATE INDEX index_name
 ON table_name(xml_column_name)
 GENERATE KEY USING XMLPATTERN '/element/@attribute'
 AS SQL VARCHAR(32)

Data type that you need to specify in the CREATE INDEX at the end is required to store the node values so that DB2 can convert XML node values specified in the xmlpattern clause to the SQL data type. The values then are then stored in a B-tree index. There are five SQL data types you can use: DOUBLE, VARCHAR(n), VARCHAR HASHED, DATE, and TIMESTAMP.

Consider the below query

xquery
declare default element namespace 'http://posample.org';
db2-fn:xmlcolumn('CUSTOMER.INFO')/customerinfo/name
@

In above case, to improve the performance, we could create an XML Index like

CREATE INDEX IndexCustname
 ON CUSTOMER(INFO)
 GENERATE KEY USING XMLPATTERN '/customerinfo/name'
 AS SQL VARCHAR(32)

So now if you observe, you can specify any xmlpattern which is similar to XPath used in the XQueries to gain the performance.

See one more example here below query

xquery
let $a := for $b in db2-fn:xmlcolumn('TEST.INFO')/lib/book
  where $b/@category = 'WEB'
  return data($b/author)
  return distinct-values($a)

Index

CREATE INDEX IndexTest1
 ON TEST(INFO)
 GENERATE KEY USING XMLPATTERN '/lib/book/@category'
 AS SQL VARCHAR(32)

This would help queries to directly get access to all ‘/lib/book/@category’

I have covered basics of DB2 XML Indexes over here. Using index for performance tuning is another topic that will be discussed using some examples later[TBD].

Reading XML data

DB2 XML data can be read using different ways, by using one of the following

  1. XQuery/XPath as a stand-alone language
  2. SQL embedded in XQuery
  3. XQuery/XPath embedded in SQL/XML
  4. Plain SQL for full-document retrieval

XQuery/XPath as a stand-alone language

XPath

The better tutorial to quickly understand the XPath is here http://www.w3schools.com/XPath/xpath_intro.asp

XQuery uses XPath, so its important to know it before we continue on XQuery

XQuery

Again here to quickly learn what XQuery is all about, please visit http://www.w3schools.com/xquery/xquery_intro.asp

To describe XQuery in short…

  • It is a language for running queries against XML-tagged documents in files and “databases”
  • Provides XPath compatibility
  • Supports conditional expressions, element constructors
  • FLOWR expressions the syntax for retrieving, filtering, and transforming operators, functions, path
  • Result of an XQuery is an instance of XML Query Data Model
  • Uses XML Schema types, offers static typing at compile time and dynamic typing at run time, supports primitive and derived types
  • could evaluate to simple node values (such as elements and attributes) or atomic values (such as strings and numbers). XQueries can also evaluate to sequences of both nodes and simple values

XML data stored in DB2 can be queried using XPath/XQuery, DB2 also provides some functions to be used in these XPath/XQuery expressions. You can read more about this XPath/Xquery Data Model(XDM) on the internet, lets concentrate here on querying XML with more examples. To write various queries, we will consider the CUSTOMER table in SAMPLE database which DB2 creates after it is installed.

The 1st command to get the INFO column data out in XML form

xquery
db2-fn:xmlcolumn('CUSTOMER.INFO')
  • You have to start xquery expression with word ‘xquery’
  • xmlcolumn is a function which takes in a case sensitive char/string of [schema_name]table_name.column_name, please google on this for more info
  • xmlcolumn belongs to db2-fn function set or in other words it is a function provided by DB2 under namespace db2-fn

xquery
declare default element namespace 'http://posample.org';
db2-fn:xmlcolumn('CUSTOMER.INFO')/customerinfo/name
@
1
----------------------------------------------------------
<name xmlns="http://posample.org">
  Kathy Smith
</name>
<name xmlns="http://posample.org">
  Kathy Smith
</name>
<name xmlns="http://posample.org">
  Jim Noodle
</name>
<name xmlns="http://posample.org">
  Robert Shoemaker
</name>
<name xmlns="http://posample.org">
  Matt Foreman
</name>
<name xmlns="http://posample.org">
  Larry Menard
</name>
6 record(s) selected.
  • This XML document/data contains namespace info, so in case of namespaces your expression should mention it, there could be multiple namespaces used in one or more XML documents inserted in a column. Please read on w3cschool to know xml namespaces
  • “/customerinfo/name” is XPath pattern that we are querying for, so we are listing all the ‘/customerinfo/name'(names) defined under a namespace, here ‘customerinfo’ is at the root level of XML document/data
  • One more thing to observe here that now I have used @ as command delimiter, reason being I have to use the default delimiter ; as required to end the ‘declare default element namespace’ syntax of xquery expression which is DB2 specific. You wont need to bother about it too much, it just to run this query in DB2 command center
  • It is not necessary to use namespace in your XML data, if not used then you don’t need to have ‘declare default element namespace’ statement, however if you define one or more namespaces in your XML document/data, then DB2 supports them and then you have to prepare your XQuery that way saying this is to operate on a namespace using ‘declare..’ statement.

FLOWR

For, Let, Order by, Where, R‘eturn are the clauses we can use in xquery.

xquery
declare default element namespace 'http://posample.org';
for $b in db2-fn:xmlcolumn('CUSTOMER.INFO')/customerinfo/name
  return $b
@

Same as above one, same output as above. And this output is in XML form not in relational


Here is one try to get the same output in relational form…

xquery
declare default element namespace 'http://posample.org';
for $b in db2-fn:xmlcolumn('CUSTOMER.INFO')/customerinfo/name
  return data($b)
@
1
--------------------
Kathy Smith
Kathy Smith
Jim Noodle
Robert Shoemaker
Matt Foreman
Larry Menard
6 record(s) selected.
  • data() is the xquery function, to represent the element data, please read more about xquery functions on internet/W3C
  • though this output looks like an relational one, however it is not. It is coming from XML stream in plane text. You have to use some DB2 functions which are there to form relational data out of XML, we will go through them in the later sections below. Again at application layer, it hardly matters whether the output is in XML/plain text or in relational form.

xquery
declare default element namespace 'http://posample.org';
<root>
  {
    for $b in db2-fn:xmlcolumn('CUSTOMER.INFO')/customerinfo/name
      order by data($b)
      return
      <element1>{data($b)}</element1>
  }
</root>
@
1
---------------------------------------
<root xmlns="http://posample.org">
  <element1>
    Jim Noodle
  </element1>
  <element1>
    Kathy Smith
  </element1>
  <element1>
    Kathy Smith
  </element1>
  <element1>
    Larry Menard
  </element1>
  <element1>
    Matt Foreman
  </element1>
  <element1>
    Robert Shoemaker
  </element1>
</root>
1 record(s) selected.
  • Now same data is sorted out and records are added to one element that is ‘root’, XML returning 1 record out

xquery
declare default element namespace 'http://posample.org';
for $b in db2-fn:xmlcolumn('CUSTOMER.INFO')/customerinfo
  where $b/name = 'Matt Foreman'
  return
  <element1>{data($b/addr/@country)}</element1>
@
1
----------------------------------------
<element1 xmlns="http://posample.org">
  Canada
</element1>
1 record(s) selected.

xquery
declare default element namespace 'http://posample.org';
for $b in db2-fn:xmlcolumn('CUSTOMER.INFO')/customerinfo
  where $b[fn:contains(name,"Matt")]
  return
  <element1>{data($b/addr/@country)}</element1>
@
  • Equivalent to above one
  • here it contains a xquery string function contains(). The words ‘fn’, ‘db2-fn’, used preceding : are similar to namespace/scope where that function belongs to. fn: is the default scope, ‘fn:contains’ or ‘contains’ are the same.

Listing some more examples of XQueries below.


xquery
declare default element namespace 'http://posample.org';
let $a :=
(
  for $b in db2-fn:xmlcolumn('CUSTOMER.INFO')/customerinfo/name
  return data($b)
)
return $a [position() < 4]
@
1
-------------------
Kathy Smith
Kathy Smith
Jim Noodle
3 record(s) selected.

To get first 3 records, You can use () or {} in XQuery.


xquery
declare default element namespace 'http://posample.org';
let $a := for $b in db2-fn:xmlcolumn('CUSTOMER.INFO')/customerinfo/name return data($b)
let $c := count($a)
let $d := $c - 4
return $a [position() > $d]
@
1
------------------
Jim Noodle
Robert Shoemaker
Matt Foreman
Larry Menard
4 record(s) selected.
This one is to get last 4 records

xquery
declare default element namespace 'http://posample.org';
let $a := db2-fn:xmlcolumn('CUSTOMER.INFO')/customerinfo/addr/city
let $b := distinct-values($a)
for $e in ($b)
  let $d := count( for $c in db2-fn:xmlcolumn('CUSTOMER.INFO')/customerinfo/addr/city
  where $c = $e
  return $c
)
return
<city count="{$d}" >{data($e)}</city>
@
1
------------------------------------------------
<city xmlns="http://posample.org" count="3">
  Toronto
</city>
<city xmlns="http://posample.org" count="2">
  Markham
</city>
<city xmlns="http://posample.org" count="1">
  Aurora
</city>
3 record(s) selected.

This one is to list cities and added an attribute count showing the count of customers from that city.

Now you can explore more on this yourself please :)

SQL embedded in XQuery

Here is one example where SQL is embedded in XQuery to achieve some results

xquery
declare default element namespace 'http://posample.org';
for $b in db2-fn:sqlquery('select info from customer where cid = 1000')
  return $b/customerinfo/name
@
1
-----------------------------------
<name xmlns="http://posample.org">
  Kathy Smith
</name>
1 record(s) selected.
  • Most of the times, combination of relational data and XML column is used, to deal with such structures ‘SQL embedded in XQuery’ and ‘XQuery embedded in SQL’ are useful techniques and are provided by DB2

XQuery/XPath embedded in SQL/XML

selectcid, xmlquery('declare default element namespace "http://posample.org";
for $b in $data/customerinfo
  return $b/addr/city'
  passing INFO as "data")
  from CUSTOMER

This will return a relational table having two columns, 1st listing cid and other listing XML documents, each XML document will hold <city> element.

Plain SQL for full-document retrieval

You can anytime fire plain SQL commands on tables containing XML columns to read the entire XML document.

DB2 XML Update

Imagine you have stored your XML document(s) in a database using a data type as TEXT or VARCHAR or clob in text form, or just imagine DB2 Native XML feature without having feature of XML update, where you need the following features at very low cost of processing and with high performace

  • Replace the value of a node
  • Replace a node with a new one
  • Insert a new node (even at a specific location, such as before or after a given node)
  • Delete a node
  • Rename a node
  • Modify multiple nodes in a document in a single UPDATE statement
  • Update multiple documents in a single UPDATE statement

For having even a small update to your XML document(s) stored in database, you need to perform the following

  1. The application submits an SQL or XQuery statement to read an XML document
  2. The document is retrieved from the database and serialized to text format
  3. The document is transmitted to the client
  4. The application parses the XML document, typically using the document object model (DOM)
  5. The application modifies the document using the DOM API
  6. The document is re-serialized in the client application
  7. The application submits an SQL UPDATE statement and transmits the updated document to the database server
  8. The database server parses the updated XML document and replaces the old document

a lots of processing ….?

The answer is not anymore, yes now DB2 v9 provides XML update features as mentioned above without following above 8 steps. When DB2 9.5 executes the xml UPDATE statement, it locates the relevant XML document(s) and modifies the specified elements or attributes. This happens within the DB2 storage layer, that is the document stays in DB2’s internal hierarchical XML format the entire time, without any parsing or serialization. Concurrency control and logging happens on the level of full documents. Overall, this new update process can often be 2x to 4x faster than the traditional process(8 steps mentioned above).


Now we will see different examples of XML update.

Modifying a node in database

xquery
declare default element namespace "http://posample.org";
db2-fn:xmlcolumn('CUSTOMER.INFO')/customerinfo[@Cid=1000]/phone/text()
@
1
---------------
905-477-9012
1 record(s) selected.

update customer
set info = xmlquery(
'declare default element namespace "http://posample.org";
transform
copy $new := $INFO
modify do replace value of $new/customerinfo/phone with "905-477-9013"
return  $new')
wherecid = 1000
@
DB20000I  The SQL command completed successfully.

xquery
declare default element namespace "http://posample.org";
db2-fn:xmlcolumn('CUSTOMER.INFO')/customerinfo[@Cid=1000]/phone/text()
@
1
---------------
905-477-9013
1 record(s) selected.

The same UPDATE command has been modified to support XML data type. This example uses XMLQuery() function which returns the new XML portion.

A transform expression starts with the optional “transform” keyword followed by a COPY, MODIFY, and a RETURN clause. The intuitive idea of the transform expression is that the COPY clause assigns the input document from the XML column (info) to a variable (in our case: $new), then the MODIFY clause applies one or more update operations to that variable, and finally the RETURN clause produces the result of the transform expression.

The MODIFY clause defines the actual update operations that take place. Here, the value of the “phone” element is simply replaced with a new value (905-477-9013), the RETURN clause simply returns the variable that holds the modified document. However, the return could be a much more complex expression, including element construction and FLWOR expression. Similarly, the right-hand side of the COPY clause is often simply the variable that holds the input document, in our case $INFO, but it could be more complex than that. The right-hand side of the COPY clause must evaluate to a single node, meaning it cannot be an empty sequence or a sequence of more than one item. This single node can have descendants, which means it can be (and often is) the root of a full XML document.

Since the “transform” keyword is optional, it is omitted from here on.


Modifying a node on the fly in SELECT

selectxmlquery(
'declare default element namespace "http://posample.org";
copy $new := $INFO
modify do replace value of $new/customerinfo/phone with "905-xxx-xxxx"
return $new ')
from customer
wherecid = 1000;
@

Here it is the magic, here we are updating XML data not in database but on the fly within SELECT statement to restrict showing the complete phone number of customers. Here the update is used to blacken out all but the area code of the phone number.

For more examples and to know more solutions/pitfalls about XML update please visit … http://www.ibm.com/developerworks/db2/library/techarticle/dm-0710nicola/

SQL/XML

  • A set of functions provided by DB2
  • A bridge between SQL/relational and XML worlds
  • helps to represent relational data into XML tree
  • provides a set of functions to be applied on relational data to form XML tree
  • provides some functions to query/construct/validate/transform XML data
  • helps to access/extract fragments of XML data
  • helps to join relational and XML data
  • Achieves aggregation and grouping of XML data on the SQL level

Here is one small example to show what is SQL/XML as it has described above

Some functions XMLElement, XMLAgg, XMLAttributes are applied on relational columns to form one XML tree, as their names suggest XMLElement is to form the xml element, XMLAttributes is to prepare the xml attributes and add them to one xml element, XMLAgg/aggregation is to aggregate the XML elements or to specify the XML siblings, to know more about them please refer to DB2 SQL/XML reference online.


XMLQuery(), XMLTable() and XMLExists() are the commonly used functions of this SQL/XML suite, we will see below the examples of these. For rest of the SQL/XML functions and to know the details of each SQL/XML function, please go through the DB2 references on the internet.

XMLExists()

This function helps to check whether an XQuery/XPath expression returns a sequence of one or more elements, accordingly it returns TRUE or FALSE. Here is one example of it

selectcid from customer where
XMLEXISTS(
'declare default element namespace "http://posample.org";
$d/customerinfo/addr/city[text()="Toronto"]' passing CUSTOMER.INFO as "d")

This returns the cid for those records/XML column where city is Toronto


XMLQuery()

The XMLQuery function allows you to execute XQuery within an SQL Query. It returns a sequence/column of XML type. example:

selectcid, XMLQUERY(
'declare default element namespace "http://posample.org";
$b/customerinfo/name/text()' passing CUSTOMER.INFO as "b") AS "NAME"
from customer where
XMLEXISTS(
'declare default element namespace "http://posample.org";
$d/customerinfo/addr/city[text()="Toronto"]' passing CUSTOMER.INFO as "d")

This example generates the same output as of the previous one but adding one more column to it which is generated by XMLQUERY


One slight change to above example

selectcid, XMLCAST(
XMLQUERY(
'declare default element namespace "http://posample.org";
$b/customerinfo/name' passing CUSTOMER.INFO as "b")
AS VARCHAR(32)    ) AS "NAME"
from customer where
XMLEXISTS(
'declare default element namespace "http://posample.org";
$d/customerinfo/addr/city[text()="Toronto"]' passing CUSTOMER.INFO as "d")
Now we have prepared the 2nd column "Name" in varchar(32) form using one more SQL/XML function XMLCAST

XMLTable()

XMLTABLE() functions returns a table from an XQuery expression. example:-

select Cid, Name, Country, City, Zip
from customer, XMLTABLE(
XMLNAMESPACES (DEFAULT 'http://posample.org'),
'declare default element namespace "http://posample.org";
for $b in $data/customerinfo
return $b'
passing CUSTOMER.INFO as "data"
COLUMNS
Name    VARCHAR(16) path 'name/text()',
Country VARCHAR(8) path 'addr/@country',
City    VARCHAR(8) path 'addr/city/text()',
Zip     VARCHAR(8) path 'addr/pcode-zip/text()'
) AS info
@
CID                  NAME             COUNTRY  CITY     ZIP
-------------------- ---------------- -------- -------- --------
1000 Kathy Smith      Canada   Toronto  M6W 1E6
1001 Kathy Smith      Canada   Markham  N9C 3T6
1002 Jim Noodle       Canada   Markham  N9C 3T6
1003 Robert Shoemaker Canada   Aurora   N8X 7F8
1004 Matt Foreman     Canada   Toronto  M3Z 5H9
1005 Larry Menard     Canada   Toronto  M4C 5K8
6 record(s) selected.

Here we have projected 1st column CID from relational data while other columns are formed from XML data using XMLTABLE() function. Whenever you are dealing with XML data containing namespace, you have to specify the namespace in your XQuery or SQL/XML. While inputing XML data returned from internal XQuery, we are asking XMLTABLE function to operate on it and it belongs to a namespace ‘http://posample.org‘ by using XMLNAMESPACE() function of SQL/XML suite. And as you see, while writing internal XQuery there we have specified again the same namespace to work on, which is to be used by XQuery expression only.


Some more SQL/XML functions are

  • XMLAGG aggregate function – Returns an XML sequence containing an item for each non-null value in a set of XML values
  • XMLATTRIBUTES scalar function – Constructs XML attributes from the arguments. This function can only be used as an argument of the XMLELEMENT function
  • XMLCOMMENT scalar function – Returns an XML value with a single XQuery comment node with the input argument as the content
  • XMLCONCAT scalar function – Returns a sequence containing the concatenation of a variable number of XML input arguments
  • XMLDOCUMENT scalar function – Returns an XML value with a single XQuery document node with zero or more children nodes. This function creates a document node, which by definition, every XML document must have. A document node is not visible in the serialized representation of XML, however, every document that is to be stored in a DB2 table must contain a document node. Note that the XMLELEMENT function does not create a document node, only an element node. Thus when constructing XML documents that are to be inserted, it is not sufficient to create only an element node. The document must contain a document node
  • XMLELEMENT scalar function – Returns an XML value that is an XML element node
  • XMLFOREST scalar function – Returns an XML value that is a sequence of XML element nodes
  • XMLNAMESPACES declaration – Constructs namespace declarations from the arguments. This declaration can only be used as an argument of the XMLELEMENT, XMLFOREST, and XMLTABLE functions
  • XMLPI scalar function – Returns an XML value with a single XQuery processing instruction node
  • XMLTEXT scalar function – Returns an XML value with a single XQuery text node having the input argument as the content
  • XMLROOT – Creates the root node of an XML document

Here below I have showcased one more example of SQL/XML functions. A jsp script is been used here to generate a table, One html table(tag) is being prepared and populated with some data from one relational table Employee. Both approaches generate the same output. Please don’t get into concept of style sheets to make and decorate the html table in real applications, this is just to showcase the use of SQL/XML.

XML in DB2 9.5

Traditional way

db2 select empno, firstnme, lastname from employee
EMPNO  FIRSTNME     LASTNAME
------ ------------ ---------------
000010 CHRISTINE    HAAS
000020 MICHAEL      THOMPSON
.
.
000030 SALLY        KWAN
200340 ROY          ALONZO
42 record(s) selected.

// fetching relational data
//construct html table
<table ….
<!—setting table attributes ->
<%While(rs.next())// 42 fetches {%>
// construct table rows
<tr…>
<!—setting row attributes ->
//construct table columns
<!—setting column attributes ->
<td…><%=(rs.getString(“EMPNO”))%>
<td…><%=(rs.getString(“FIRSTNME”))%>
<td…><%=(rs.getString(“LASTNAME”))%>
</tr>
<%}%>
 

SQL/XML way…

SELECT XMLSerialize(
XMLELEMENT(NAME "TABLE",
--      XMLATTRIBUTES(’80%’ AS “width”)
XMLAGG( XMLELEMENT(NAME "TR",
XMLELEMENT(NAME "TD", empno),
XMLELEMENT(NAME "TD", firstnme),
XMLELEMENT(NAME "TD", lastname))))
AS varchar(4000)) FROM employee
// single fetch and html(xml) is ready
<% rs.next(); %>
<%=(rs.getString(1))%>
// job done

XML full-text search

An extension to DB2, mainly used to enable the faster text search, text data could come from an XML/text or varchar/char.

http://www.ibm.com/developerworks/db2/library/techarticle/dm-0606seubert/index.html

References

Whats new in DB2 9

Update XML in DB2 9.5

15 best practices for pureXML performance in DB2

Overview of new DB2 Version 9.5 pureXML enhancements