Data Henrik: TPoX

Showing posts with label TPoX. Show all posts

Monday, December 5, 2011

WMD: Weapon of mass destruction? No, Workload Multiuser Driver!

When you hear of WMD, for many of you a term other than Workload Multiuser Driver may come up first. But it is the term and hence the acronym the team over at the Sourceforge project working on this add-on to the DB2 Technology Explorer chose. The WMD is a RESTful web service which allows multiple users to concurrently run different workloads against DB2 and WMD can be controlled from the Technology Explorer.

This is of course interesting when you want to showcase certain features of DB2. However, as the WMD is a component of its own and can be downloaded as such, it is also one of the options to set up your own performance or system tests. And this brings me to the question I was recently asked: What free workload drivers do you know of for DB2?

In addition to the WMD there is also a workload driver in the TPoX (Transaction Processing over XML data) benchmark, another Sourceforge project for DB2. It cannot be downloaded separately, but it is documented and can be adapted to your own needs.

What other free workload drivers do you know of, which ones do you prefer?

Tuesday, November 22, 2011

All the news: DB2 Express-C with PL/SQL, new TPoX version, DS 3.1, and hello to all PMs

I am in between business trips and there is not much time for looking deeper into any specific problems. But I wanted to touch base on few things that are new:

DB2 Express-C, the free to download, free to use edition of DB2 LUW now includes the Oracle compatibility. That is, you can develop and use PL/SQL packages with DB2, for free, even in production. Although it still says "9.7.4" at the DB2 Express-C download site, when you click through it then offers DB2 9.7.5 for download. I just tried it.
A new version of TPoX, the open source XML database benchmark, has been released. Most changes are to the workload driver. I know that some of you use the workload driver not just for TPoX and DB2.
BTW: The DB2 Technology Explorer/Management Console includes a so-called Workload Multiuser Driver (WMD) that can be handy, too.
IBM Data Studio 3.1 is out since few weeks and Data Studio will replace the DB2 Control Center in the near future. There is a so-called Administration Client (which is small) and a Full Client. Both have a different download size and a different function set. An overview of what is included and which database servers in addition to DB2 are supported is listed at this Data Studio V3.1 features document.

Last but not least I would like to say Hello to all Project Managers (link to Dilbert comic). Dining out and working in large projects never will be the same again...

Wednesday, September 30, 2009

New TPoX release and performance numbers

[Seems like it is benchmark day today] Version 2.0 of the TPoX benchmark (Transaction Processing over XML) has been released. In an earlier post I explained what TPoX is and why it exists. The new release of the benchmark specification has some changes in how the data is generated as well as in some update statements of the workload. The workload driver has also been modified (its properties are now XML-based) to adapt it easierly.

What is also out since last month are TPoX performance results based on version 2.0. A 1 TB workload was tested against DB2 V9.7 on AIX 6.1 on a IBM BladeCenter JS43. The numbers were also compared against DB2 V9.5FP4 run on the same setup in the paper showing the benchmark details.

Please note that due to the changes in the benchmark specification, benchmark results from version 1.x cannot/should not be compared to those from version 2.0.

Wednesday, July 8, 2009

Oracle + TPoX = no results?

Conor has a nice post about Oracle and their lack of published TPoX benchmark results. Oracle has published all kinds of benchmark results (like those from TPC), but not for the XML benchmark "TPoX".
A while ago I had already "commented" about Oracle and their XML claims and lack of results. There is also a post about TPoX to give you some more background information.

Tuesday, April 28, 2009

stringIDs in DB2 pureXML: What and Why

Earlier this month I had asked you about stringIDs in DB2 pureXML. Answer B) from the provided options is correct. DB2 replaces structural information such as element names, attribute names, namespace prefixes and URIs with stringIDs. But what are stringIDs and why are they used?

When we look at a simple XML document like

<department>
  <employee>
    <firstname>Henrik</firstname>
    <lastname>Loeser</lastname>
  </employee>
</department>

and an XPath expression like "/department/employee/lastname", then - first of all - we see a lot of strings. The strings are of different length and the tags (the markup) make up most of the document. And we haven't even introduced namespaces here.

How do you efficiently store such documents? XML can be very verbose compared to relational data. How do you quickly as possible navigate within such documents, i.e., compare the different steps of your XPath expression to the different levels of the XML document?

The key to compactness and speed is the use of stringIDs. For DB2 pureXML every element name, attribute name, namespace URI, and namespace prefix is substituted by a 32 bit integer value when an XML document gets parsed. Each string is mapped to a unique number, a so-called stringID.

In the above example, all "department" could be replaced by 1, all "employee" by 2, etc. When the DB2 engine compiles a query and generates an executable package it also uses the stringIDs. This way, when at runtime the XPath expression is evaluated on the data, only integer values need to be compared. First we need to match the root element, i.e., look for the element name with value 1 ("department"). If we found one, the child needs to be a 2 ("employee"). Comparing integer values is of course much faster than comparing strings of variable length.

How fast the XQuery execution is in DB2 pureXML can be seen when you look at the TPoX benchmark results or by reading some of the customer success stories collected at the pureXML wiki.

Tuesday, February 10, 2009

XML Database Benchmarks, TPoX, and DB2 pureXML

I recently got asked why DB2 pureXML is using the TPoX benchmark and whether TPoX is an official benchmark similar to those from TPC or SPEC. I will try to answer that question today.

TPoX stands for "Transaction Processing over XML" and is an open source database benchmark which is available at SourceForge.net. It originated from IBM, but other sources, most significantly Intel, have been contributing to the benchmark. TPoX is an application-level XML database benchmark based on a (real) financial application scenario. The goal of TPoX is to evaluate the performance of XML database systems, focusing on XQuery, SQL/XML, XML storage, XML indexing, XML Schema support, XML updates, logging, concurrency and other database aspects.

Why is it important to mention that long list of features? This is because several other XML database benchmarks (e.g., XMach-1 , XMark, XPathMark, XOO7, XBench, MBench, Michigan Benchmark, and MemBeR) already existed before TPoX was born. All but one or two of these focus mostly on XQuery performance or on specific database aspects, not on the entire system. For a company that plans to buy an XML database system it is not good enough to know that the XPath evaluation of a system is outstanding when insert processing or bufferpool management are not worth a penny. In other words, being good in one aspect of what makes up a database system is not good enough to produce a well-rounded, reliable, and performant (XML) database system, a system database user are really looking for.

Because both TPC and SPEC were not interested in developing an XML database benchmark, because of the lack of an adequate database benchmark, and because of not much interest from other database vendors IBM eventually proposed TPoX to the database and XML community (see SIGMOD 2007 paper and 2006 Dagstuhl seminar on XQuery Implementation Paradigms) and made it open source. Why open source? It allows open discussions, contributions, and usage of the benchmark and its code.

Since TPoX has been made available, many companies, universities, business partners, other database vendors, and of course IBM have used TPoX to evaluate XML database performance. Some results have been posted at http://tpox.sourceforge.net/tpoxresults.htm, including results on a 1 TB database (the latter also has some nice overview slides). Note that many database vendor do not allow disclosure of benchmark results without their agreement.

Coming back to the original question whether TPoX is an official TPC or SPEC benchmark the answer is no, because there are no such XML database benchmarks. But TPoX is a well-adopted benchmark that allows to compare XML database systems by taking a well-balanced approach to cover most aspects of what makes up a (commercial) database system.

Pages