Data Henrik: benchmark

Showing posts with label benchmark. Show all posts

Tuesday, July 16, 2024

About BIRD, SQL, IBM granite models, and your business reporting

BIRD-SQL benchmark

Some years ago, when composing SQL queries, I was hoping that those queries would just "fly", performing flawlessly and quickly. Now, I stumbled over something SQL-related that seems to fly: BIRD-bench. It measures Large Language Models' (LLMs) capabilities to generate SQL queries from text input. It is at the core of SQL: You describe the result set you need.

Performance Tuning for XML-based OLTP Systems (Temenos T24)

From time to time my colleagues and I get contacted on how to set up and tune DB2 for use with Temenos T24. The latter is a core banking system and in use worldwide. And it uses XML as internal data storage format which makes it an ideal fit for DB2 pureXML (both on z/OS and on Linux, UNIX, and Windows). To make our answers easily findable, why not blog about it? Today, I will cover setting up tables for good performance.

A typical T24 table consists only of two columns, a RECID (as primary key) and an XMLRECORD. This is similar to other systems implemented based on XML data, both OLTP and more of OLAP/BI character.

Inlining of data

For these kind of tables it is a good idea to INLINE the XML column. Inlining is good for several reasons. Because XML data is regularly stored in a special XML Data Area (XDA), its access requires an indirection from the row data. The descriptor that is stored in the row is taken as input to look up the actual storage location via the XML Regions Index. Eliminating the descriptor means the XML data is directly fetched with the row as well as fewer entries in the XML Regions Index.

If there is test data, the function ADMIN_EST_INLINE_LENGTH can be used to find the typical inline length that should be set. The maximum inline length depends on the page size and hence the maximum row size. Both are documented with the just mentioned function. For T24 the recommendation is to use an inline length of 32000 bytes or even more in 32 kb pages.

Volatile Tables for Index Use

Another configuration option for tables is to declare them as VOLATILE. The background is explained on this page with "Relational index performance tips". Basically, the optimizer will prefer index-based access to the table even if the cardinality changes frequently.

Enable static compression

Many systems benefit from data compression. The same goes for T24 on DB2. However, the question is whether to use static (table) compression or even the newer adaptive compression (page-level compression on top). Based on different benchmarks and experience with production-oriented testing the recommendation is to use static compression. The additional CPU overhead for a higher degree of space and IO savings impacts system throughput. When in doubt test it on your own. Another effect: Indexes on compressed tables will also be compressed by default.

Consider APPEND ON or DB2MAXFSCRSEARCH=1

With APPEND ON as an option to CREATE/ALTER TABLE, DB2 will insert "at the end" of the table without searching for free space elsewhere. It results in slightly higher space consumption, but benefits insert performance. An alternative is to set DB2MAXFSCRSEARCH to a low value, e.g. to 1. It determines on a global level of how many free space control records (FSCR) to search for placing a new record. Thus, all tables would be impacted, not just the one where APPEND ON is specified.

Separate tablespaces for table, index, and long data

It is always a good idea to have separate tablespaces for the different page types. They can be specified during CREATE TABLE.

PCTFREE 99 for hotspot tables

Depending on the application and usage type of T24 there can be hotspot tables. For these it could make sense to specify a high value for PCTFREE during CREATE/ALTER TABLE. This determines how much space is left free (read: unused) in a page, i.e., how many or how few records are stored in a single page. The fewer records - the extreme would be a single record - in a page, the less likely that page becomes a hotspot.

Consider table/range partitioning for bigger tables

Last, but not least, it is a good idea to apply range partitioning on bigger tables. This benefits performance in several ways: Old data can be rolled out (DETACHed) quickly, queries perform faster due to range elimination and rebalancing of work, and maintenance can also be done in parallel on the ranges.
The partitioning can either be done on the RECID (the primary key) or a new hidden column could be introduced to evenly split the data into ranges, depending on needs.

Summary

Tuning basic properties for XML-based tables is not (that) different from other tables. The above gives an introduction of should be considered when setting up Temenos T24 or similar XML-based systems on DB2. I didn't provide examples with the full syntax. If you need them, leave a comment... ;-)

Special thanks to Juergen Goetz for sharing his experience for use with other customers and this blog entry.

BTW: There are also few links for Temenos T24 on DB2 on my resources page. General DB2 system setup for HA for Temenos is discussed in the Redpaper "High Availability and Disaster Recovery for Temenos T24 with DB2 and AIX"

Friday, November 16, 2012

Where to find information about granted privileges in DB2

Yesterday, I reveived a question about where to find specific metadata in DB2. The customer wanted to find out which privileges had been granted within a database and they were aware that db2look can produce this list. But where does this information come from?

Let's start with a glimpse at db2look. It is the "DB2 statistics and DDL extraction tool" and can be used to produce the DDL statements for the objects inside a database. There are also options specific to handling authorizations, i.e., the privileges (see the -x, -xdep, -xd, and -xddep options). All the statements that db2look produces are based on special data stored in the database, the so-called metadata. Those tables that hold the metadata are called System Catalog in DB2 and Data Dictionary in Oracle. The system catalog than can be queried using regular SELECT statements because the information is provided in tables (users can stay within the data mode, the relation model).

DB2 offers the metadata in two different sets of views. The views in the SYSCAT schema basically have all the metadata. The views in the SYSSTAT schema have a subset of the data and have (some) updatable columns, so that object statistics can be changed (for good or for worse...). The views are built on top of the internal catalog tables which are managed in the SYSIBM schema. It is advised to only use the SYSCAT and SYSSTAT views because structure of the internal tables can change without warning whereas the external catalog views are kept intact.

Now, where can the information about granted privileges be found? The DB2 Information Center has a so-called "Road map to catalog views" which is a listing of all the offered metadata found in the SYSCAT and SYSSTAT views. All the views that end in *AUTH carry authorization information, i.e., data about privileges. To give some examples, in SYSCAT.INDEXAUTH you can find out who has CONTROL privilege on an index, SYSCAT.ROLEAUTH is useful to see who has ADMIN authority for granting a specific role, and finally, as a complex example, SYSCAT.TABLEAUTH manages all the table privileges like insert, update, delete, select, alter, control, and even more privileges.

Administration tools for DB2 access that information, procedures and scripts can select data from these views, and of course applications and users. Which of these views are made available for the PUBLIC is up to the administrators. Remember, it's an honor, not a privilege...

Comments or questions?

Monday, December 5, 2011

WMD: Weapon of mass destruction? No, Workload Multiuser Driver!

When you hear of WMD, for many of you a term other than Workload Multiuser Driver may come up first. But it is the term and hence the acronym the team over at the Sourceforge project working on this add-on to the DB2 Technology Explorer chose. The WMD is a RESTful web service which allows multiple users to concurrently run different workloads against DB2 and WMD can be controlled from the Technology Explorer.

This is of course interesting when you want to showcase certain features of DB2. However, as the WMD is a component of its own and can be downloaded as such, it is also one of the options to set up your own performance or system tests. And this brings me to the question I was recently asked: What free workload drivers do you know of for DB2?

In addition to the WMD there is also a workload driver in the TPoX (Transaction Processing over XML data) benchmark, another Sourceforge project for DB2. It cannot be downloaded separately, but it is documented and can be adapted to your own needs.

What other free workload drivers do you know of, which ones do you prefer?

Wednesday, October 13, 2010

Boring news? Yet another benchmark record for DB2 on POWER

Yesterday, IBM announced another world record, this time for the Two-Tier SAP Sales and Distribution (SD) Standard Application Benchmark. Again, this was based on DB2 for Linux, UNIX, and Windows running on the IBM POWER platform.

Now combine this very competitive speed and throughput with very competitive pricing and you should have a winner. If you are on Oracle right now then, yes, DB2 understands PL/SQL as well.

Wednesday, August 18, 2010

Smaller, but way faster and cheaper: IBM sets new TPC-C record

I am traveling right now, but wanted to point you to a new TPC-C benchmark result. IBM set a new record running DB2 9.7 on a POWER7-based system. The IBM system is more than 35% faster than the so-far top Oracle result, providing 41% better price/performance and 35% better energy efficiency per transaction. This should set an end to the misleading and silly Oracle advertisements (I hope, but am not sure).

Tuesday, February 23, 2010

Servers For Truth or IBM vs. Oracle

Youtube has a new IBM commercial, officially by "Servers For Truth", dealing with the still ongoing discussion about Oracle's TPC benchmark-related claims and the facts behind them. If you ever experienced election campaigns in the US, this commercial comes close to some of those election-related spots.

Wednesday, September 30, 2009

New TPoX release and performance numbers

[Seems like it is benchmark day today] Version 2.0 of the TPoX benchmark (Transaction Processing over XML) has been released. In an earlier post I explained what TPoX is and why it exists. The new release of the benchmark specification has some changes in how the data is generated as well as in some update statements of the workload. The workload driver has also been modified (its properties are now XML-based) to adapt it easierly.

What is also out since last month are TPoX performance results based on version 2.0. A 1 TB workload was tested against DB2 V9.7 on AIX 6.1 on a IBM BladeCenter JS43. The numbers were also compared against DB2 V9.5FP4 run on the same setup in the paper showing the benchmark details.

Please note that due to the changes in the benchmark specification, benchmark results from version 1.x cannot/should not be compared to those from version 2.0.

TPC fines Oracle for recent benchmark claims

The Register has an article about Oracle being fined by the TPC because of recent ads related to Exadata2. Oracle has to pay $10,000 and was ordered to remove/withdraw ads, webpages, etc. which Oracle apparently already did.

Added: The issue is here at the TPC website.

Tuesday, February 10, 2009

XML Database Benchmarks, TPoX, and DB2 pureXML

I recently got asked why DB2 pureXML is using the TPoX benchmark and whether TPoX is an official benchmark similar to those from TPC or SPEC. I will try to answer that question today.

TPoX stands for "Transaction Processing over XML" and is an open source database benchmark which is available at SourceForge.net. It originated from IBM, but other sources, most significantly Intel, have been contributing to the benchmark. TPoX is an application-level XML database benchmark based on a (real) financial application scenario. The goal of TPoX is to evaluate the performance of XML database systems, focusing on XQuery, SQL/XML, XML storage, XML indexing, XML Schema support, XML updates, logging, concurrency and other database aspects.

Why is it important to mention that long list of features? This is because several other XML database benchmarks (e.g., XMach-1 , XMark, XPathMark, XOO7, XBench, MBench, Michigan Benchmark, and MemBeR) already existed before TPoX was born. All but one or two of these focus mostly on XQuery performance or on specific database aspects, not on the entire system. For a company that plans to buy an XML database system it is not good enough to know that the XPath evaluation of a system is outstanding when insert processing or bufferpool management are not worth a penny. In other words, being good in one aspect of what makes up a database system is not good enough to produce a well-rounded, reliable, and performant (XML) database system, a system database user are really looking for.

Because both TPC and SPEC were not interested in developing an XML database benchmark, because of the lack of an adequate database benchmark, and because of not much interest from other database vendors IBM eventually proposed TPoX to the database and XML community (see SIGMOD 2007 paper and 2006 Dagstuhl seminar on XQuery Implementation Paradigms) and made it open source. Why open source? It allows open discussions, contributions, and usage of the benchmark and its code.

Since TPoX has been made available, many companies, universities, business partners, other database vendors, and of course IBM have used TPoX to evaluate XML database performance. Some results have been posted at http://tpox.sourceforge.net/tpoxresults.htm, including results on a 1 TB database (the latter also has some nice overview slides). Note that many database vendor do not allow disclosure of benchmark results without their agreement.

Coming back to the original question whether TPoX is an official TPC or SPEC benchmark the answer is no, because there are no such XML database benchmarks. But TPoX is a well-adopted benchmark that allows to compare XML database systems by taking a well-balanced approach to cover most aspects of what makes up a (commercial) database system.

Data Henrik

Pages