ChannelDB2

I regularly hear of people who want to compare the contents of two tables. For example, they have one table that is supposed to be a copy of another. Maybe it should be an identical copy. Or, maybe it contains a transformed version of the other's data. Whatever the case, they want to know if differences exist and they need more than just a simple count of rows from "select count(*)".

You can search the web and find all sorts of SQL query techniques to do this. Therefore, I'm not going to cover those here. Instead, I'm going to focus on two commands built into DB2 and InfoSphere Warehouse. One command compares the contents of two tables (or two query results) and then saves a list of rows that are different between the two. The other command 'repairs' one of the two tables by modifying it based on the differences listed by the first command.
The commands are called asntdiff and asntrep. I'll explain each briefly and then show some simple examples that you can try with a DB2 Sample database. For those interested, yes, there are some considerations for very large tables and tables that are constantly changing. I hope to cover those in a future post. This one is just an introduction.

asntdiff - the Command to Compare Two Tables
asntdiff was originally written for people who replicate data. They use it for a couple of reasons. First, they can compare a source table to a target table to see how in sync the two tables are. Second, if they know someone has modified target data accidentially or incorrectly, they can repair the target based on the differences found.
asntdiff was later enhanced to compare the result sets of two queries. This introduced a lot of flexibility and let people use the command with many more tables that than just those that are replicated.
When asntdiff runs it records information about each difference in a table. By default, asntdiff creates this table and calls it asn.asntdiff. The table looks something like the following:

The first column, 'diff', shows an SQL action - 'I' for insert, 'U' for update, and 'D' for delete - followed by a number that is determined by the order the table is seen by asntdiff. In the case of a subscription, table '1' is the source and table '2' is the target. The remaining columns are for the values of key columns that are to be used with the action. For example, "D 2" says if you want the second table in the comparison to look like the first table, you need to delete the row identified by the given key values. "I 2" says take a copy of the row identified by the key values from the first table and insert it into the second table.

asntrep - the command to 'Repair' a Table Based on Differences Found

This command uses the differences found by asntdiff to change one table's data to match that of the other table. As shown in the previous paragraph, it handles an insert ('I') or update ('U') by retrieving the necessary data to issuing a full SQL insert or update against the table that needs to change. For deletes, it simply uses the keys from the diff table to delete rows from the given table.


Do It Yourself *

I have one example that uses asntdiff to compare the results of two queries and another that shows how to use both commands with data replication.

The Example with Two Queries

So, as I looked at the Sample database, I found that none of the existing tables help illustrate a real-world use with two queries. However, a few tables can still be used, as is, to demonstrates this approach to comparing tables. We'll choose the Employee and Staff tables.

Start by using cut and past to put the following lines (grey box) into a file. Let's call the file queries.asntdiff.

#
# Source_Server is the the database where your first query runs.
#
SOURCE_SERVER=sample
SOURCE_SELECT="select empno from employee
where empno < '100000'
order by 1"

#
# Target_Server is the the database where your second query runs.
#
TARGET_SERVER=sample
TARGET_SELECT="select char(lpad(to_char(id),6,'0')) as ID from staff
order by 1"
Notice three things in this file before we use it. First, even though the file's keywords have 'Source' and 'Target' in them, the tables do not need to be part of a replication configuration. Second, the data types of the two queries must match exactly. In fact, the second query uses an SQL expression to make the data type identical for the first column in the select list. Third, an Order By clause is required. It must use column numbers instead of column names. The columns must be for the primary key columns in the table, or at least cover a set of columns that guarantee uniqueness within the table.

Now, bring up a DB2 command window so we can use asntdiff with the -f option. From the command line, enter the following command:

asntdiff -f queries.asntdiff
The command writes several lines to stdout. The first few show parameter defaults. The next block echos the lines from our file. The last few say what asntidff found. The very last line is a message that says how many differences were found and where the list is stored. Your output should look very similar to mine shown here:
2010-03-28-18.39.00.578000 ASN4010I "AsnTDiff" : "ASN" : "Initial" :
Number of differences found between the source and target table: "3".
The details can be found in database "SAMPLE", difference table "ASN"."ASNTDIFF".

You can now query the asn.asndtiff table. Assuming the default contents of the Sample database, your results should be the following:
That's it. You're now ready to do your own queries with asntdiff. However, I recommend you read the documentation to understand the options available.


An Example with SQL Replication
This uses the very simple replication set up found in A Fast Way to Get Started with DB2's SQL Replication. If you don't have that set up yet, it won't take long ;--) Naturally, if you have your own SQL Replication or Q Replication configuration, you can follow along and modify the example using what you've defined.

Steps to follow:

1. Make sure SQL Capture and SQL Apply are running.
2. Stop making changes to the source table. (We will start with the tables in sync.)
3. If you just made changes, give SQL Apply a minute to make sure all changes are in the target.
4. Run the following command from a DB2 command window to see that no differences exist:
asntdiff db=sample where="set_name = 'SET00' and target_table = 'TRGEMPLOYEE'

The lastmessage of output should say there are '0' differences. However, before we move on, let's look at this command for just a second.

The command needs to know which database contains your Apply control tables. Use the 'db' parameter for this. You also need to identify a single subscription member (the object that shows which target table goes with which source table). Do this through the 'where' parameter. This parameter is just what it sounds like - a list of predicates.

You may be thinking that this doesn't look very special when compared to writing your own query to look for differences. However, a number of things can be done by asntdiff that make your life simplier. For example, if you are using the techniques discussed in Transforming Data as You Replicate, asntdiff automatically applies those transformations as data is compared. Also, the command has several useful options.

5. From your DB2 window, connect to the sample database and delete a row with these two commands:

db2 connect to sample
db2 delete from trgemployee where empno='200340'

6. Now, run the asntdiff command again from your DB2 window:

asntdiff db=sample where="set_name = 'SET00' and target_table = 'TRGEMPLOYEE'

You should see messages similar to the following written to stdout:

2010-03-30-16.27.10.345000 ASN4006I "AsnTDiff" : "ASN" : "Initial" :
Between the source table and the target table, there are "41" common rows, "1" rows that are unique to the source table, and "0" rows that are unique to the target table.

2010-03-30-16.27.10.360000 ASN4010I "AsnTDiff" : "ASN" : "Initial" :
Number of differences found between the source and target table: "1".
The details can be found in database "SAMPLE", difference table ""ASN"."ASNTDIFF"". ".

7. If you query the asn.asntdiff table, your results should show one row as follows:
".
That's it. That's all it takes to compare the rows of source and target tables. So, what about 'repairing' the target based on this result? Let's do that next :--)


Repairing the Target Table

This will be short. To have our target rows match our source rows, use the following asntrep command:

asntrep db=sample where="set_name = 'SET00' and target_table = 'TRGEMPLOYEE'

First, notice the command requires the same where clause that was used with asntdiff. The repair needs this so it knows which source and target tables are associated with the data in the asn.asntdiff table.

Next, look at the information written to stdout. asntrep shows all the SQL it issued. However, the last message of output is what you really need. It is a summary of repair actions. Your output should look similar to the following:

2010-03-31-08.54.39.484000 ASN4019I "AsnTRep" : "ASN" : "Initial" :
The program applied the rows from the difference table to the target table as follows:
"1" rows were inserted, "0" rows were updated, and "0" rows were deleted.

You can now query the target to verify the deleted row has been copied from the source to the target. Or, you could run asntdiff to see that no differences are found.
You're done :-)
--
* Update: The Do It Yourself section requires you be at 9.7 or higher since that's when the -f option was added to asntdiff.

Views: 2707

Tags: asntdiff, data replication, fast start, table compare utility

Comment

You need to be a member of ChannelDB2 to add comments!

Join ChannelDB2

Comment by david t on December 8, 2010 at 1:18pm
Hi, Dilip,

In my example, there are lines identifying the the database server (e.g., SOURCE_SERVER=sample). The name you specify is a DB2 database alias. This is the name you specify on DB2 commands such "db2 connect to database_alias".

If your database resides on the same system as asntdiff, you should be able to list it and all available DB2 aliases by bringing up a DB2 command window and issuing "db2 list database directory". If your database is on another system (i.e., not the same one as asntdiff), you would need to issue the DB2 commands to add it to the DB2 database directory. These commands are run from a DB2 command window and look like "db2 catalog tcpip node" and "db2 catalog database at node" (the DB2 Information Center has complete doc on these).

If the database is on another system, you will also need to create an encrypted password file with DB2's asnpwd command.

I mention all of this because, based on the messages you received, such as the one with SQLCODE -1013, it looks like you are either missing a server statement in the file or you have a missing or misspelled database alias.

Thanks, David.
Comment by Dilip on December 8, 2010 at 7:28am
Hi David
Thanks for the response.
I tried doing it. But I get the following error

2010-12-08-17.53.11.849000 ASN0600I "AsnTDiff" : "" : "Initial" : Program "asntdiff 9.5.0" is starting.
2010-12-08-17.53.12.005000 ASN0552E "AsnTDiff" : "" : "Initial" : The program encountered an SQL error. The server name is "". The SQL request is "CO
NNECT". The table name is "N/A". The SQLCODE is "-1013". The SQLSTATE is "42705". The SQLERRMC is "". The SQLERRP is "SQLEUCCM".

2010-12-08-17.53.12.005000 SQL1013N "AsnTDiff" : "" : "Initial" : The database alias name or database name "" could not be found.
SQLSTATE=42705

2010-12-08-17.53.12.036000 ASN0530E "AsnTDiff" : "" : "Initial" : The program could not connect to database "" with USERID "N/A" . The SQLCODE is "-
1013".
2010-12-08-17.53.12.036000 ASN8013D "AsnTDiff" : "" : "Initial" : Error "-5" from asnGetPwd for User "N/A" on DB2 "" using password file "asnpwd.aut
".
2010-12-08-17.53.12.052000 ASN0589I "AsnTDiff" : "" : "Initial" The program received an unexpected return code "907" from routine "dbConnectionnit::d
bConnectCtx".

I believe its looking for some database connection, but still I am connected to the db.
Comment by david t on December 7, 2010 at 7:56pm
Hi, Dilip,

Yes, you can use the -f option even if you are not using any form of data replication. The -f option doesn't look for a replication configuration. In other words, think of it as a DB2 command instead of a replication command.

The -f option takes two queries and compares their result sets. Therefore the result sets must have the same number of columns and the columns must have the same data types.
Comment by Dilip on November 26, 2010 at 1:15am
Hello David
That was a nice post.
I have a question.. Can I use this asntdiff with the -f option without a replication environment set??
Can I use it within a database of 2 tables having a similar definition??

Try BLU Acceleration on Cloud

© 2014   Created by channeldb2.

Badges  |  Report an Issue  |  Terms of Service