Data Henrik: storage

Showing posts with label storage. Show all posts

Wednesday, January 29, 2025

Simple container-based Minio deployment for Db2

MinIO console with Db2-generated files

You probably have read that I am testing Db2 external tables and remote storage. External tables are data files stored outside the database itself, either in the local file system or on (typically) S3-compatible object storage. To be able to test everything locally, even without Internet connectivity while traveling, I installed and configured MinIO. Here are the few steps to get it up and running as a Docker/podman deployment.

A look at local external tables in Db2

Surprising Db2 results or not?

My recent post about Db2 backup to S3 or Object Storage triggered a question. And the answer to it is that I wrote about external tables first in 2017, about 7 years ago. Since then, some features were added or changed for external tables and remote storage support, most recently in Db2 11.5.7 (AFAIK). With my new Db2 playground in place, I thought it was nice to test external tables again.

Revisited: Db2 backup to S3 / Object Storage

Db2 backup in an S3 bucket

One of the popular posts I have written is on Db2 and S3-compatible Cloud Object Storage, a follow-up from an older article on that same topic. Because things change and I recently set up my new test environment, I just tested backing up a Db2 database to an S3-compatible bucket in my IBM Cloud storage service. Here are my notes.

Tutorial on cloud end-to-end security - an overview

Cloud solution architecture

Recently, my team updated our tutorial on applying end-to-end security to a cloud application. The tutorial features a typical app with an attached NoSQL database and object storage. Moreover, it leverages other services for increased security and to provide observability. Even better, all components can be automatically deployed, including a Tekton-based delivery pipeline. In this blog post, I am going to provide an overview and discuss some implementation details.

ETL in the cloud using SQL Query and Db2

From COS to Db2 using SQL

The SQL Query service on IBM Cloud allows to process data stored on Cloud Object Storage (COS) by writing SQL queries. So far, results were either shown in the console and / or written back to files on COS. Thus, I was happy to notice a new feature: Query results can now be written back to Db2 tables (on cloud). From my tests both Db2 on Cloud and Db2 Warehouse on Cloud are supported.

Cloud-based FIPS 140-2 Level 4 crypto service

Locks, keys, and data security

Yesterday, I got my hands on a FIPS 140-2 Level 4 cloud-based crypto service. If you are asking "what's that and what can I do with it?" you should continue to read. It is a dedicated HSM (Hardware Security Module) to manage encryption keys and offered as IBM Hyper Protect Crypto Services on IBM Cloud (HPCS). Here is what I learned while using the service.

Db2 with External Tables: First Tests

External Tables - New in Db2

Db2 Warehouse recently added a new feature named external table. Because I was interested in that feature for a while and I have an instance of Db2 Warehouse on Cloud running in the IBM Cloud datacenter in Frankfurt, Germany, I performed some quick tests. Here is what it is and how I got it to work.

Combining Db2 and Cloud Object Storage

Db2 and Cloud Storage
happily combined

Since a while, Db2 has built-in support to directly access Cloud Object Storage (COS) like Amazon S3 and IBM COS (Softlayer / IBM Bluemix IaaS). It allows to perform backups (and restores :) as well as data load operations to and from the cloud. Phil Nelson wrote an excellent overview and introduction on the IDUG blog with his examples focussed on S3. Thus, in the following I am going to mostly cover Db2 using the IBM Cloud Object Storage and fill in some additional details.

Extended row size support in DB2: Fitting more data into a single row than the page size would typically allow (and sometimes it makes sense, but that is another story - stay tuned for user experience...)

One of the new features of the new DB2 10.5 release is the support for so-called extended row size. As you might have guessed from the article's title, you can squeeze a lot of data into a single row now. The feature can be used for compatibility reasons with other vendors, to consolidate a table with "side tables" into a single one, or to possibly improve performance by moving tables to smaller page sizes. Let's take a look into the details of "extended row size".

Many of the DB2 size-related limits are documented in the overview of SQL and XML limits, however all the information on column sizes, row sizes, and what can be fitted into data pages of different size is part of the CREATE TABLE documentation. There, way down, we find that for 4k pages the row size is limited to 4005 bytes, for 8k to 8101 bytes, for 16k to 16293 bytes, and for 32k to 32677 bytes. So, how can we insert more data into a row?

A new database configuration parameter, extendend_row_sz, has been added. For new databases the default is ENABLE, for migrated databases DISABLE (for backward compatibility). I created a new database and the extended row size support is enabled. Let's give it a try:

db2 => connect to hl

   Database Connection Information

Database server        = DB2/LINUX 10.5.0
SQL authorization ID   = HLOESER
Local database alias   = HL

db2 => create table ers(id int unique not null, firstname varchar(3000), lastname varchar(3000))
DB20000I The SQL command completed successfully.
db2 => update db cfg using extended_row_sz disable
DB20000I The UPDATE DATABASE CONFIGURATION command completed successfully.
db2 => create table ers2(id int unique not null, firstname varchar(3000), lastname varchar(3000))
DB21034E The command was processed as an SQL statement because it was not a
valid Command Line Processor command. During SQL processing it returned:
SQL0286N A table space could not be found with a page size of at least "8192"
that authorization ID "HLOESER" is authorized to use. SQLSTATE=42727
db2 => update db cfg using extended_row_sz enable
DB20000I The UPDATE DATABASE CONFIGURATION command completed successfully.
db2 => create table ers2(id int unique not null, firstname varchar(3000), lastname varchar(3000))
DB20000I The SQL command completed successfully.
db2 => insert into ers values(1,'Henrik','Loeser')
DB20000I The SQL command completed successfully.

The database configuration parameter can be changed online as can be seen above. Creation of the table ERS2 is blocked with extended row sizes DISABLED, then it works after enabling it again. Inserting some simple test data into the table succeeds. To test this new feature further, I created a data file with 3 rows (ID is 2 to 4). The reason for this type of test is the size limitation of the DB2 CLP and trying to avoid a GUI. The first row has two long strings with about 3000 bytes each, the second row a short and a long string, the last row a long and a short string. Importing the data is no problem:

db2 => import from "data.txt" of del insert into ers
SQL3109N The utility is beginning to load data from file "data.txt".

SQL3110N The utility has completed processing. "3" rows were read from the
input file.

SQL3221W ...Begin COMMIT WORK. Input Record Count = "3".

SQL3222W ...COMMIT of any database changes was successful.

SQL3149N "3" rows were processed from the input file. "3" rows were
successfully inserted into the table. "0" rows were rejected.

Number of rows read         = 3
Number of rows skipped      = 0
Number of rows inserted     = 3
Number of rows updated      = 0
Number of rows rejected     = 0
Number of rows committed    = 3

So how can we find out more about how this feature works? The documentation says that some data might be stored outside the row as LOB if the data is too big for the row. I am going to use db2dart for that. First I obtain the TABLE ID and TABLESPACE ID of our test table ERS:

db2 => select tableid,tbspaceid from syscat.tables where TABNAME='ERS'

TABLEID TBSPACEID
------- ---------
      8         2

1 record(s) selected.

The next step is to invoke db2dart from the shell for the database "hl":
hloeser@rotach:~$ db2dart hl /DD

Please enter
Table ID or name, tablespace ID, first page, num of pages, and y/n for verbose:
(may suffix page number with 'p' for pool relative if working with a pool-relative tablespace)
8 2 0 80 y

         The requested DB2DART processing has completed successfully!
                    Complete DB2DART report found in: HL.RPT

The "/DD" option direct db2dart to dump data pages. It prompts us to enter the IDs for the table and tablespace (which we know), the first page to dump (zero), the number of pages to dump (80), and whether verbose mode is preferred (yes). Based on our input a report file "HL.RPT" is generated. After some general information, free space control records, and a table description record, the first actual data records can be seen:
            Slot 4:

               Offset Location = 2850 (xB22)
               Record Length = 34 (x22)

               Record Type = Table Data Record (FIXEDVAR)

               Record Flags = 0

               Fixed part length value = 14

                  Column 1:
            Fixed offset: 0
                  Type is Long Integer
                  Value = 1

                  Column 2:
            Fixed offset: 4
                  Type is Variable Length Character String
                  Length = 6 Offset = 14
                      48656E72 696B                          Henrik

                  Column 3:
            Fixed offset: 9
                  Type is Variable Length Character String
                  Length = 6 Offset = 20
                      4C6F6573 6572                          Loeser

         Slots Summary: Total=5, In-use=5, Deleted=0.
This is our first row (ID 1) that we inserted, everything is stored in a single record, as expected and as usual. Reading on in the file, the other rows appear. I won't print them here for size reasons, but here is how they are stored.

ID 2, two long strings: first string in the record, second string outside the record
ID 3, short string, long string: both strings within a data page, single record because they fit
ID 4, long string and short string: both strings within a data page, single record because they fit

The column value for row with ID2 is stored outside the record as LOB. The db2dart output looks like this:

                  Column 3:
            Fixed offset: 9
                  Type is Variable Length Character String
                  Length = 24 Offset = 3020

                  Var Descriptor Length = 24
                     lfd_check = 89
                     lfd_version = 0
                     lfd_life_lsn = 0000000000064942
                     lfd_size = 3024
                     lfd_first = 2
                     lfd_dir = 0

         Slots Summary: Total=1, In-use=1, Deleted=0.
What can also be seen is that only a single slot is in use on this data page because of the record size.

I hope you found these details interesting and it encourages you to use db2dart to look into storage structures. When I visit universities, I always tell students to dig deeper into the implementation of database systems...

Tuesday, April 30, 2013

Videos on DB2 BLU Acceleration

I haven't written about DB2 with BLU Acceleration before (but will start shortly).
Susan Visser has compiled a long list of related reading material to get you started how the integration of columnar storage technology deep into DB2 can benefit your enterprise. Let me share some DB2 BLU-related videos that were recently published.

Video 1 on DB2 BLU Acceleration
<iframe width="560" height="315" src="/http://www.youtube.com/embed/SH1twB1TChg" frameborder="0" allowfullscreen></iframe>

Video 2 on DB2 BLU Acceleration
<iframe width="560" height="315" src="/http://www.youtube.com/embed/n6D5Ns5J16U" frameborder="0" allowfullscreen></iframe>

Video 3: Client Praise on DB2 BLU
<iframe width="560" height="315" src="/http://www.youtube.com/embed/5T6f74gYu1Y" frameborder="0" allowfullscreen></iframe>

Wednesday, August 1, 2012

Vacation: Time to REBALANCE (and to look at DB2 10)

It is vacation time, time to recharge and to rebalance life (BTW: Some years back some companies talked about work-life balance, now it is "work/life integration" - towards always on). When I thought about "rebalance", some new DB2 features came to mind. You see, I am still in work mode...

When changing storage in DB2, e.g., adding containers to a tablespace or removing them from it, the tablespace is rebalanced to keep it evenly striped. DB2 10 introduced the concept of storage groups. It allows to group tablespaces with similar properties or "tasks" together. Using ALTER TABLESPACE it is possible to change the associated storage group. To move the data from the former to the new storage (group), data is rebalanced in the background, i.e., asynchronously. To have greater control over those background tasks, DB2 10 adds SUSPEND and RESUME to the ALTER TABLEPACE ... REBALANCE statements. So you can decide when to take a break...

As you might know, there is another rebalance operation available in DB2, storage-related again. It is used in pureScale environments after changes to the shared file system (cluster file system) and rebalances storage utilization. The rebalance option is part of the db2cluster command which is used to interact with the reliable, scalable cluster infrastructure (RSCT) and the cluster file system GPFS.

As both rebalance operations are recommended for "periods of low system activity", this could mean vacation time - bringing me back to what I am preparing for...

Wednesday, June 13, 2012

DB2 pureXML and bufferpools - revisited

Some years ago I wrote that XML data is buffered for performance reason and looked at the XDA object. Now I found out that I didn't answer the "buffered or not" question entirely. The question which bufferpools are utilised depending on inlined or regular XML storage remains. So let me answer it.

CREATE TABLE myX (id INT, x1 XML INLINE LENGTH 500, x2 XML) IN myT1 LONG IN myT2;

Considering the above CREATE TABLE statement, data for table myX would be stored in two tablespaces, myT1 and myT2, and use the associated bufferpool(s). If the internal representation of the XML data for column x1 would be up to 500 bytes, the row data consisting of an integer value for id, the XML data for x1, and a descriptor pointing to the XML data of x2 would be stored in the DAT object and hence in tablespace myT1. When accessed, the data would go to the bufferpool associated with myT1. If the XML data for x1 would be larger than the 500 bytes, the row would hold an integer and two descriptors (one each for x1 and x2).
The descriptors would point to entries in data pages in the XDA object which is stored in tablespace myT2. All XML data for x2 and the data too long to be inlined for x1 would be stored in myT2. When accessed, the bufferpool associated with tablespace myT2 would hold the pages.

In short again: XML data is buffered and it goes through the bufferpool associated with the tablespace it is stored in. This is the same as with regular relational data or index pages.

Friday, May 7, 2010

Automate DB2 database maintenance

On developerWorks is a new article "Automate DB2 9.7 database maintenance in an embedded database environment". In an environment where DB2 is an embedded database system it is important to hide the existence of the DBS by making sure it just works. The article discusses how to set up the various automatic and autonomic features that DB2 provides. It's not only interesting for those dealing with embedded environments, but for any "lazy DB2" in general. There is so much other work to do that you don't want to deal with maintenance that can be automated... (and it is Friday and the weekend is up next...).

Thursday, April 22, 2010

Overview: DB2 tablespaces and bufferpools

An older article on developerWorks, "DB2 Basics: Table spaces and buffer pools" has been updated for DB2 9.7. If you want a quick introduction to SMS, DMS, large and regular tablespaces, row size and column count limits, extent and prefetch sizes, etc., this is a good starter.

Friday, March 6, 2009

Does you car need different engines? (XML Storage)

Today I plan to talk about cars and XML storage, two seemingly different topics.

Cars have an engine, the engine is used to move the car with slow, middle, or high speeds, on flat roads, up and down hills or mountains, with light or heavy load. You don't need a different engine for different terrains. It's great because you can go where you need to go without too much pre-planning - flexibility you love. You don't need to select and put in a different engine for the next task ahead. There are more benefits because service technicians know how to deal with that single engine, car manufacturers can put their development budget into improving that single engine, and you know how the engine is behaving in different situations. It's not always optimal, but you are prepared to take the roads ahead because you don't need to change engines. You are prepared for unexpected detours ahead. Watch out if someone tries to sell you a car where you need to change engines!

BTW: With automatic transmissions you don't need to shift gears, with newer automatic transmissions there isn't even any gear shifting. Cars have evolved to turn on lights automatically, even to watch out for and assist in critical or complex situations (ESP, ABS, etc.). It takes stress from you and let's you reach your destinations safely.

Coming to the topic of XML storage, I want to emphasize that XML is about flexibility. DB2 pureXML only has a single XML type, a single XML storage engine. You don't need to choose a different XML storage option if your requirements change. You don't need to decide (and know) upfront which road to take, no issue dealing with an unexpected detour because your DB2 engine is suited to handle it. This is flexibility you love (or will learn to love)! Combine that with automatic storage, self-tuning memory management, and other autonomics and life is less stressful.

What a perspective for the weekend. And remember to watch out...

Data Henrik

Pages