ChannelDB2

More than year ago i presented simple document how to insert data into DB2 quickly using PHP and multi row insert statement. Today i will show you something for JDBC coders, similar but more flexible.

In JDBC you are free to use multirow insert with parameter markers too, but there is disadvantage because you don't know how much rows will be inserted. Most JDBC coders use different sized prepared statements and switch between them at runtime.

I will give you an example. You will prepare 3 insert statements: one with 100 row insert, second with 10 row insert and last with one row insert. You will make buffer big enough for 100 rows then read input data into buffer. If buffer is full, flush it to db using 100 rows prepared statement and continue. If end of data is reached, flush buffer using multiple 10 or 1 row inserts.

This works, but it is just quite a lot of coding. Lets take a look on something simpler. JDBC drivers version 2 or higher can optionally offer batch update feature. Most database drivers supports this by directly mapping JDBC batch to native DB protocol batch data transfer or just emulate it by repeating statement. This feature is safe to use and portable.

JDBC Batch updates are described in DB2 manual. They are very easy to use.

Create prepared statement with insert markers and set parameter values. This is standard prepared statement handling but instead of executing it by executeUpdate() call addBatch(). Now you can set parameters to different value and call addBatch() again. Repeat as much as you need. To save memory don't use huge batches. I found that 1000 rows package is more then enough for getting all performance benefits and it is just slightly faster then using 100 rows package. After batch is prepared call executeBatch() to process it. Do not forget to call clearBatch() before creating a new one.

PreparedStatement  ps = c.prepareStatement("INSERT INTO ACCOUNTS (BID,AID,ABALANCE)"+
"VALUES(?,?,0)");
for(int i=oldscalefactor*100000+1;i<=newscalefactor*100000;) {
for(int j=0;j<1000;j++,i++){
ps.setInt(1, 1+(i-1)/100000);
ps.setInt(2, i);
ps.addBatch();
}
ps.executeBatch();
c.commit();
ps.clearBatch();
}

Using this method i was able to get 58.8k rows inserted per second into table with one index and one foreign key check on cheap $1k hardware. No hand tuning was necessary, it was enough to start self tuning memory manager and let him about 15 minutes to tune database under this workload. STMM choose to use just 2600 8k buffers. CPU was about 50% busy, if we need to insert faster then open second connection to database and push data in 2 threads.

Note that we are using real, user supplied data. For preparing benchmarks (this code fragment is from my TPC-B benchmark) we often do not need to insert user defined data. Simple random numbers or sequence will be enough. If we move this code into stored procedure and drop index on table to get some performance gain we can insert about 200k - 250k rows per second. If we still need to go faster than LOAD statement is the way. Using LOAD you can insert on POWER6 hardware millions rows per second.

POWER6 System p machines are very powerful. 8 core POWER6 can do transaction processing for 2500 users and have 20% spare capacity for dealing with workload spikes.

Views: 5975

Comment

You need to be a member of ChannelDB2 to add comments!

Join ChannelDB2

Featured Downloads

Try BLU Acceleration on Cloud

© 2019   Created by channeldb2.   Powered by

Badges  |  Report an Issue  |  Terms of Service