![]() |
MinIO console with Db2-generated files |
Henrik's thoughts on life in IT, data and information management, cloud computing, cognitive computing, covering IBM Db2, IBM Cloud, Watson, Amazon Web Services, Microsoft Azure and more.
Showing posts with label storage. Show all posts
Showing posts with label storage. Show all posts
Wednesday, January 29, 2025
Simple container-based Minio deployment for Db2
Wednesday, August 28, 2024
A look at local external tables in Db2
![]() |
Surprising Db2 results or not? |
Labels:
administration,
data in action,
DB2,
developer,
IBM,
IT,
lakehouse,
sql,
storage,
version 11.5
Friday, August 9, 2024
Revisited: Db2 backup to S3 / Object Storage
![]() |
Db2 backup in an S3 bucket |
Thursday, January 25, 2024
Tutorial on cloud end-to-end security - an overview
![]() |
Cloud solution architecture |
Recently, my team updated our tutorial on applying end-to-end security to a cloud application. The tutorial features a typical app with an attached NoSQL database and object storage. Moreover, it leverages other services for increased security and to provide observability. Even better, all components can be automatically deployed, including a Tekton-based delivery pipeline. In this blog post, I am going to provide an overview and discuss some implementation details.
Tuesday, November 26, 2019
ETL in the cloud using SQL Query and Db2
![]() |
From COS to Db2 using SQL |
Tuesday, May 7, 2019
Cloud-based FIPS 140-2 Level 4 crypto service
![]() |
Locks, keys, and data security |
Tuesday, September 12, 2017
Db2 with External Tables: First Tests
![]() |
External Tables - New in Db2 |
Wednesday, August 16, 2017
Combining Db2 and Cloud Object Storage
![]() |
Db2 and Cloud Storage happily combined |
Labels:
administration,
bluemix,
cloud,
dashdb,
DB2,
ibmcloud,
IT,
knowledge center,
storage,
version 11
Wednesday, July 10, 2013
Extended row size support in DB2: Fitting more data into a single row than the page size would typically allow (and sometimes it makes sense, but that is another story - stay tuned for user experience...)
One of the new features of the new DB2 10.5 release is the support for so-called extended row size. As you might have guessed from the article's title, you can squeeze a lot of data into a single row now. The feature can be used for compatibility reasons with other vendors, to consolidate a table with "side tables" into a single one, or to possibly improve performance by moving tables to smaller page sizes. Let's take a look into the details of "extended row size".
Many of the DB2 size-related limits are documented in the overview of SQL and XML limits, however all the information on column sizes, row sizes, and what can be fitted into data pages of different size is part of the CREATE TABLE documentation. There, way down, we find that for 4k pages the row size is limited to 4005 bytes, for 8k to 8101 bytes, for 16k to 16293 bytes, and for 32k to 32677 bytes. So, how can we insert more data into a row?
A new database configuration parameter, extendend_row_sz, has been added. For new databases the default is ENABLE, for migrated databases DISABLE (for backward compatibility). I created a new database and the extended row size support is enabled. Let's give it a try:
db2 => connect to hl
Database Connection Information
Database server = DB2/LINUX 10.5.0
SQL authorization ID = HLOESER
Local database alias = HL
db2 => create table ers(id int unique not null, firstname varchar(3000), lastname varchar(3000))
DB20000I The SQL command completed successfully.
db2 => update db cfg using extended_row_sz disable
DB20000I The UPDATE DATABASE CONFIGURATION command completed successfully.
db2 => create table ers2(id int unique not null, firstname varchar(3000), lastname varchar(3000))
DB21034E The command was processed as an SQL statement because it was not a
valid Command Line Processor command. During SQL processing it returned:
SQL0286N A table space could not be found with a page size of at least "8192"
that authorization ID "HLOESER" is authorized to use. SQLSTATE=42727
db2 => update db cfg using extended_row_sz enable
DB20000I The UPDATE DATABASE CONFIGURATION command completed successfully.
db2 => create table ers2(id int unique not null, firstname varchar(3000), lastname varchar(3000))
DB20000I The SQL command completed successfully.
db2 => insert into ers values(1,'Henrik','Loeser')
DB20000I The SQL command completed successfully.
The database configuration parameter can be changed online as can be seen above. Creation of the table ERS2 is blocked with extended row sizes DISABLED, then it works after enabling it again. Inserting some simple test data into the table succeeds. To test this new feature further, I created a data file with 3 rows (ID is 2 to 4). The reason for this type of test is the size limitation of the DB2 CLP and trying to avoid a GUI. The first row has two long strings with about 3000 bytes each, the second row a short and a long string, the last row a long and a short string. Importing the data is no problem:
db2 => import from "data.txt" of del insert into ers
SQL3109N The utility is beginning to load data from file "data.txt".
SQL3110N The utility has completed processing. "3" rows were read from the
input file.
SQL3221W ...Begin COMMIT WORK. Input Record Count = "3".
SQL3222W ...COMMIT of any database changes was successful.
SQL3149N "3" rows were processed from the input file. "3" rows were
successfully inserted into the table. "0" rows were rejected.
Number of rows read = 3
Number of rows skipped = 0
Number of rows inserted = 3
Number of rows updated = 0
Number of rows rejected = 0
Number of rows committed = 3
So how can we find out more about how this feature works? The documentation says that some data might be stored outside the row as LOB if the data is too big for the row. I am going to use db2dart for that. First I obtain the TABLE ID and TABLESPACE ID of our test table ERS:
db2 => select tableid,tbspaceid from syscat.tables where TABNAME='ERS'
TABLEID TBSPACEID
------- ---------
8 2
1 record(s) selected.
The next step is to invoke db2dart from the shell for the database "hl":
hloeser@rotach:~$ db2dart hl /DD
Please enter
Table ID or name, tablespace ID, first page, num of pages, and y/n for verbose:
(may suffix page number with 'p' for pool relative if working with a pool-relative tablespace)
8 2 0 80 y
The requested DB2DART processing has completed successfully!
Complete DB2DART report found in: HL.RPT
The "/DD" option direct db2dart to dump data pages. It prompts us to enter the IDs for the table and tablespace (which we know), the first page to dump (zero), the number of pages to dump (80), and whether verbose mode is preferred (yes). Based on our input a report file "HL.RPT" is generated. After some general information, free space control records, and a table description record, the first actual data records can be seen:
Slot 4:
Offset Location = 2850 (xB22)
Record Length = 34 (x22)
Record Type = Table Data Record (FIXEDVAR)
Record Flags = 0
Fixed part length value = 14
Column 1:
Fixed offset: 0
Type is Long Integer
Value = 1
Column 2:
Fixed offset: 4
Type is Variable Length Character String
Length = 6 Offset = 14
48656E72 696B Henrik
Column 3:
Fixed offset: 9
Type is Variable Length Character String
Length = 6 Offset = 20
4C6F6573 6572 Loeser
Slots Summary: Total=5, In-use=5, Deleted=0.
This is our first row (ID 1) that we inserted, everything is stored in a single record, as expected and as usual. Reading on in the file, the other rows appear. I won't print them here for size reasons, but here is how they are stored.
Column 3:
Fixed offset: 9
Type is Variable Length Character String
Length = 24 Offset = 3020
Var Descriptor Length = 24
lfd_check = 89
lfd_version = 0
lfd_life_lsn = 0000000000064942
lfd_size = 3024
lfd_first = 2
lfd_dir = 0
Slots Summary: Total=1, In-use=1, Deleted=0.
What can also be seen is that only a single slot is in use on this data page because of the record size.
I hope you found these details interesting and it encourages you to use db2dart to look into storage structures. When I visit universities, I always tell students to dig deeper into the implementation of database systems...
Many of the DB2 size-related limits are documented in the overview of SQL and XML limits, however all the information on column sizes, row sizes, and what can be fitted into data pages of different size is part of the CREATE TABLE documentation. There, way down, we find that for 4k pages the row size is limited to 4005 bytes, for 8k to 8101 bytes, for 16k to 16293 bytes, and for 32k to 32677 bytes. So, how can we insert more data into a row?
A new database configuration parameter, extendend_row_sz, has been added. For new databases the default is ENABLE, for migrated databases DISABLE (for backward compatibility). I created a new database and the extended row size support is enabled. Let's give it a try:
db2 => connect to hl
Database Connection Information
Database server = DB2/LINUX 10.5.0
SQL authorization ID = HLOESER
Local database alias = HL
db2 => create table ers(id int unique not null, firstname varchar(3000), lastname varchar(3000))
DB20000I The SQL command completed successfully.
db2 => update db cfg using extended_row_sz disable
DB20000I The UPDATE DATABASE CONFIGURATION command completed successfully.
db2 => create table ers2(id int unique not null, firstname varchar(3000), lastname varchar(3000))
DB21034E The command was processed as an SQL statement because it was not a
valid Command Line Processor command. During SQL processing it returned:
SQL0286N A table space could not be found with a page size of at least "8192"
that authorization ID "HLOESER" is authorized to use. SQLSTATE=42727
db2 => update db cfg using extended_row_sz enable
DB20000I The UPDATE DATABASE CONFIGURATION command completed successfully.
db2 => create table ers2(id int unique not null, firstname varchar(3000), lastname varchar(3000))
DB20000I The SQL command completed successfully.
db2 => insert into ers values(1,'Henrik','Loeser')
DB20000I The SQL command completed successfully.
The database configuration parameter can be changed online as can be seen above. Creation of the table ERS2 is blocked with extended row sizes DISABLED, then it works after enabling it again. Inserting some simple test data into the table succeeds. To test this new feature further, I created a data file with 3 rows (ID is 2 to 4). The reason for this type of test is the size limitation of the DB2 CLP and trying to avoid a GUI. The first row has two long strings with about 3000 bytes each, the second row a short and a long string, the last row a long and a short string. Importing the data is no problem:
db2 => import from "data.txt" of del insert into ers
SQL3109N The utility is beginning to load data from file "data.txt".
SQL3110N The utility has completed processing. "3" rows were read from the
input file.
SQL3221W ...Begin COMMIT WORK. Input Record Count = "3".
SQL3222W ...COMMIT of any database changes was successful.
SQL3149N "3" rows were processed from the input file. "3" rows were
successfully inserted into the table. "0" rows were rejected.
Number of rows read = 3
Number of rows skipped = 0
Number of rows inserted = 3
Number of rows updated = 0
Number of rows rejected = 0
Number of rows committed = 3
So how can we find out more about how this feature works? The documentation says that some data might be stored outside the row as LOB if the data is too big for the row. I am going to use db2dart for that. First I obtain the TABLE ID and TABLESPACE ID of our test table ERS:
db2 => select tableid,tbspaceid from syscat.tables where TABNAME='ERS'
TABLEID TBSPACEID
------- ---------
8 2
1 record(s) selected.
The next step is to invoke db2dart from the shell for the database "hl":
hloeser@rotach:~$ db2dart hl /DD
Please enter
Table ID or name, tablespace ID, first page, num of pages, and y/n for verbose:
(may suffix page number with 'p' for pool relative if working with a pool-relative tablespace)
8 2 0 80 y
The requested DB2DART processing has completed successfully!
Complete DB2DART report found in: HL.RPT
The "/DD" option direct db2dart to dump data pages. It prompts us to enter the IDs for the table and tablespace (which we know), the first page to dump (zero), the number of pages to dump (80), and whether verbose mode is preferred (yes). Based on our input a report file "HL.RPT" is generated. After some general information, free space control records, and a table description record, the first actual data records can be seen:
Slot 4:
Offset Location = 2850 (xB22)
Record Length = 34 (x22)
Record Type = Table Data Record (FIXEDVAR)
Record Flags = 0
Fixed part length value = 14
Column 1:
Fixed offset: 0
Type is Long Integer
Value = 1
Column 2:
Fixed offset: 4
Type is Variable Length Character String
Length = 6 Offset = 14
48656E72 696B Henrik
Column 3:
Fixed offset: 9
Type is Variable Length Character String
Length = 6 Offset = 20
4C6F6573 6572 Loeser
Slots Summary: Total=5, In-use=5, Deleted=0.
This is our first row (ID 1) that we inserted, everything is stored in a single record, as expected and as usual. Reading on in the file, the other rows appear. I won't print them here for size reasons, but here is how they are stored.
- ID 2, two long strings: first string in the record, second string outside the record
- ID 3, short string, long string: both strings within a data page, single record because they fit
- ID 4, long string and short string: both strings within a data page, single record because they fit
Column 3:
Fixed offset: 9
Type is Variable Length Character String
Length = 24 Offset = 3020
Var Descriptor Length = 24
lfd_check = 89
lfd_version = 0
lfd_life_lsn = 0000000000064942
lfd_size = 3024
lfd_first = 2
lfd_dir = 0
Slots Summary: Total=1, In-use=1, Deleted=0.
What can also be seen is that only a single slot is in use on this data page because of the record size.
I hope you found these details interesting and it encourages you to use db2dart to look into storage structures. When I visit universities, I always tell students to dig deeper into the implementation of database systems...
Labels:
administration,
DB2,
Information Center,
insert,
IT,
Oracle,
storage,
test,
version 10.5
Tuesday, April 30, 2013
Videos on DB2 BLU Acceleration
I haven't written about DB2 with BLU Acceleration before (but will start shortly).
Susan Visser has compiled a long list of related reading material to get you started how the integration of columnar storage technology deep into DB2 can benefit your enterprise. Let me share some DB2 BLU-related videos that were recently published.
Video 1 on DB2 BLU Acceleration
<iframe width="560" height="315" src="/http://www.youtube.com/embed/SH1twB1TChg" frameborder="0" allowfullscreen></iframe>
Video 2 on DB2 BLU Acceleration
<iframe width="560" height="315" src="/http://www.youtube.com/embed/n6D5Ns5J16U" frameborder="0" allowfullscreen></iframe>
Video 3: Client Praise on DB2 BLU
<iframe width="560" height="315" src="/http://www.youtube.com/embed/5T6f74gYu1Y" frameborder="0" allowfullscreen></iframe>
Susan Visser has compiled a long list of related reading material to get you started how the integration of columnar storage technology deep into DB2 can benefit your enterprise. Let me share some DB2 BLU-related videos that were recently published.
Video 1 on DB2 BLU Acceleration
<iframe width="560" height="315" src="/http://www.youtube.com/embed/SH1twB1TChg" frameborder="0" allowfullscreen></iframe>
Video 2 on DB2 BLU Acceleration
<iframe width="560" height="315" src="/http://www.youtube.com/embed/n6D5Ns5J16U" frameborder="0" allowfullscreen></iframe>
Video 3: Client Praise on DB2 BLU
<iframe width="560" height="315" src="/http://www.youtube.com/embed/5T6f74gYu1Y" frameborder="0" allowfullscreen></iframe>
Wednesday, August 1, 2012
Vacation: Time to REBALANCE (and to look at DB2 10)
It is vacation time, time to recharge and to rebalance life (BTW: Some years back some companies talked about work-life balance, now it is "work/life integration" - towards always on). When I thought about "rebalance", some new DB2 features came to mind. You see, I am still in work mode...
When changing storage in DB2, e.g., adding containers to a tablespace or removing them from it, the tablespace is rebalanced to keep it evenly striped. DB2 10 introduced the concept of storage groups. It allows to group tablespaces with similar properties or "tasks" together. Using ALTER TABLESPACE it is possible to change the associated storage group. To move the data from the former to the new storage (group), data is rebalanced in the background, i.e., asynchronously. To have greater control over those background tasks, DB2 10 adds SUSPEND and RESUME to the ALTER TABLEPACE ... REBALANCE statements. So you can decide when to take a break...
As you might know, there is another rebalance operation available in DB2, storage-related again. It is used in pureScale environments after changes to the shared file system (cluster file system) and rebalances storage utilization. The rebalance option is part of the db2cluster command which is used to interact with the reliable, scalable cluster infrastructure (RSCT) and the cluster file system GPFS.
As both rebalance operations are recommended for "periods of low system activity", this could mean vacation time - bringing me back to what I am preparing for...
When changing storage in DB2, e.g., adding containers to a tablespace or removing them from it, the tablespace is rebalanced to keep it evenly striped. DB2 10 introduced the concept of storage groups. It allows to group tablespaces with similar properties or "tasks" together. Using ALTER TABLESPACE it is possible to change the associated storage group. To move the data from the former to the new storage (group), data is rebalanced in the background, i.e., asynchronously. To have greater control over those background tasks, DB2 10 adds SUSPEND and RESUME to the ALTER TABLEPACE ... REBALANCE statements. So you can decide when to take a break...
As you might know, there is another rebalance operation available in DB2, storage-related again. It is used in pureScale environments after changes to the shared file system (cluster file system) and rebalances storage utilization. The rebalance option is part of the db2cluster command which is used to interact with the reliable, scalable cluster infrastructure (RSCT) and the cluster file system GPFS.
As both rebalance operations are recommended for "periods of low system activity", this could mean vacation time - bringing me back to what I am preparing for...
Wednesday, June 13, 2012
DB2 pureXML and bufferpools - revisited
Some years ago I wrote that XML data is buffered for performance reason and looked at the XDA object. Now I found out that I didn't answer the "buffered or not" question entirely. The question which bufferpools are utilised depending on inlined or regular XML storage remains. So let me answer it.
Considering the above CREATE TABLE statement, data for table myX would be stored in two tablespaces, myT1 and myT2, and use the associated bufferpool(s). If the internal representation of the XML data for column x1 would be up to 500 bytes, the row data consisting of an integer value for id, the XML data for x1, and a descriptor pointing to the XML data of x2 would be stored in the DAT object and hence in tablespace myT1. When accessed, the data would go to the bufferpool associated with myT1. If the XML data for x1 would be larger than the 500 bytes, the row would hold an integer and two descriptors (one each for x1 and x2).
The descriptors would point to entries in data pages in the XDA object which is stored in tablespace myT2. All XML data for x2 and the data too long to be inlined for x1 would be stored in myT2. When accessed, the bufferpool associated with tablespace myT2 would hold the pages.
In short again: XML data is buffered and it goes through the bufferpool associated with the tablespace it is stored in. This is the same as with regular relational data or index pages.
CREATE TABLE myX (id INT, x1 XML INLINE LENGTH 500, x2 XML) IN myT1 LONG IN myT2;
Considering the above CREATE TABLE statement, data for table myX would be stored in two tablespaces, myT1 and myT2, and use the associated bufferpool(s). If the internal representation of the XML data for column x1 would be up to 500 bytes, the row data consisting of an integer value for id, the XML data for x1, and a descriptor pointing to the XML data of x2 would be stored in the DAT object and hence in tablespace myT1. When accessed, the data would go to the bufferpool associated with myT1. If the XML data for x1 would be larger than the 500 bytes, the row would hold an integer and two descriptors (one each for x1 and x2).
The descriptors would point to entries in data pages in the XDA object which is stored in tablespace myT2. All XML data for x2 and the data too long to be inlined for x1 would be stored in myT2. When accessed, the bufferpool associated with tablespace myT2 would hold the pages.
In short again: XML data is buffered and it goes through the bufferpool associated with the tablespace it is stored in. This is the same as with regular relational data or index pages.
Friday, May 7, 2010
Automate DB2 database maintenance
On developerWorks is a new article "Automate DB2 9.7 database maintenance in an embedded database environment". In an environment where DB2 is an embedded database system it is important to hide the existence of the DBS by making sure it just works. The article discusses how to set up the various automatic and autonomic features that DB2 provides. It's not only interesting for those dealing with embedded environments, but for any "lazy DB2" in general. There is so much other work to do that you don't want to deal with maintenance that can be automated... (and it is Friday and the weekend is up next...).
Thursday, April 22, 2010
Overview: DB2 tablespaces and bufferpools
An older article on developerWorks, "DB2 Basics: Table spaces and buffer pools" has been updated for DB2 9.7. If you want a quick introduction to SMS, DMS, large and regular tablespaces, row size and column count limits, extent and prefetch sizes, etc., this is a good starter.
Friday, March 6, 2009
Does you car need different engines? (XML Storage)
Today I plan to talk about cars and XML storage, two seemingly different topics.
Cars have an engine, the engine is used to move the car with slow, middle, or high speeds, on flat roads, up and down hills or mountains, with light or heavy load. You don't need a different engine for different terrains. It's great because you can go where you need to go without too much pre-planning - flexibility you love. You don't need to select and put in a different engine for the next task ahead. There are more benefits because service technicians know how to deal with that single engine, car manufacturers can put their development budget into improving that single engine, and you know how the engine is behaving in different situations. It's not always optimal, but you are prepared to take the roads ahead because you don't need to change engines. You are prepared for unexpected detours ahead. Watch out if someone tries to sell you a car where you need to change engines!
BTW: With automatic transmissions you don't need to shift gears, with newer automatic transmissions there isn't even any gear shifting. Cars have evolved to turn on lights automatically, even to watch out for and assist in critical or complex situations (ESP, ABS, etc.). It takes stress from you and let's you reach your destinations safely.
Coming to the topic of XML storage, I want to emphasize that XML is about flexibility. DB2 pureXML only has a single XML type, a single XML storage engine. You don't need to choose a different XML storage option if your requirements change. You don't need to decide (and know) upfront which road to take, no issue dealing with an unexpected detour because your DB2 engine is suited to handle it. This is flexibility you love (or will learn to love)! Combine that with automatic storage, self-tuning memory management, and other autonomics and life is less stressful.
What a perspective for the weekend. And remember to watch out...
Cars have an engine, the engine is used to move the car with slow, middle, or high speeds, on flat roads, up and down hills or mountains, with light or heavy load. You don't need a different engine for different terrains. It's great because you can go where you need to go without too much pre-planning - flexibility you love. You don't need to select and put in a different engine for the next task ahead. There are more benefits because service technicians know how to deal with that single engine, car manufacturers can put their development budget into improving that single engine, and you know how the engine is behaving in different situations. It's not always optimal, but you are prepared to take the roads ahead because you don't need to change engines. You are prepared for unexpected detours ahead. Watch out if someone tries to sell you a car where you need to change engines!
BTW: With automatic transmissions you don't need to shift gears, with newer automatic transmissions there isn't even any gear shifting. Cars have evolved to turn on lights automatically, even to watch out for and assist in critical or complex situations (ESP, ABS, etc.). It takes stress from you and let's you reach your destinations safely.
Coming to the topic of XML storage, I want to emphasize that XML is about flexibility. DB2 pureXML only has a single XML type, a single XML storage engine. You don't need to choose a different XML storage option if your requirements change. You don't need to decide (and know) upfront which road to take, no issue dealing with an unexpected detour because your DB2 engine is suited to handle it. This is flexibility you love (or will learn to love)! Combine that with automatic storage, self-tuning memory management, and other autonomics and life is less stressful.
What a perspective for the weekend. And remember to watch out...
Subscribe to:
Posts (Atom)