I regularly hear of people who want to compare the contents of two tables. For example, they have one table that is supposed to be a copy of another. Maybe it should be an identical copy. Or, maybe it contains a transformed version of the other's data. Whatever the case, they want to know if differences exist and they need more than just a simple count of rows from "select count(*)".
You can search the web and find all sorts of SQL query techniques to do this. Therefore, I'm not going to cover those here. Instead, I'm going to focus on two commands built into DB2 and InfoSphere Warehouse. One command compares the contents of two tables (or two query results) and then saves a list of rows that are different between the two. The other command 'repairs' one of the two tables by modifying it based on the differences listed by the first command.
The commands are called asntdiff
. I'll explain each briefly and then show some simple examples that you can try with a DB2 Sample database. For those interested, yes, there are some considerations for very large tables and tables that are constantly changing. I hope to cover those in a future post. This one is just an introduction.
asntdiff - the Command to Compare Two Tables
asntdiff was originally written for people who replicate data. They use it for a couple of reasons. First, they can compare a source table to a target table to see how in sync the two tables are. Second, if they know someone has modified target data accidentially or incorrectly, they can repair the target based on the differences found.
asntdiff was later enhanced to compare the result sets of two queries. This introduced a lot of flexibility and let people use the command with many more tables that than just those that are replicated.
When asntdiff runs it records information about each difference in a table. By default, asntdiff creates this table and calls it asn.asntdiff. The table looks something like the following:
The first column, 'diff', shows an SQL action - 'I' for insert, 'U' for update, and 'D' for delete - followed by a number that is determined by the order the table is seen by asntdiff. In the case of a subscription, table '1' is the source and table '2' is the target. The remaining columns are for the values of key columns that are to be used with the action. For example, "D 2" says if you want the second table in the comparison to look like the first table, you need to delete the row identified by the given key values. "I 2" says take a copy of the row identified by the key values from the first table and insert it into the second table.
asntrep - the command to 'Repair' a Table Based on Differences Found
This command uses the differences found by asntdiff to change one table's data to match that of the other table. As shown in the previous paragraph, it handles an insert ('I') or update ('U') by retrieving the necessary data to issuing a full SQL insert or update against the table that needs to change. For deletes, it simply uses the keys from the diff table to delete rows from the given table.
Do It Yourself *
I have one example that uses asntdiff to compare the results of two queries and another that shows how to use both commands with data replication.
The Example with Two Queries
So, as I looked at the Sample database, I found that none of the existing tables help illustrate a real-world use with two queries. However, a few tables can still be used, as is, to demonstrates this approach to comparing tables. We'll choose the Employee and Staff tables.
Start by using cut and past to put the following lines (grey box) into a file. Let's call the file queries.asntdiff.
# Source_Server is the the database where your first query runs.
SOURCE_SELECT="select empno from employee
where empno < '100000'
order by 1"
# Target_Server is the the database where your second query runs.
TARGET_SELECT="select char(lpad(to_char(id),6,'0')) as ID from staff
order by 1"
Notice three things in this file before we use it. First, even though the file's keywords have 'Source' and 'Target' in them, the tables do not need to be part of a replication configuration. Second, the data types of the two queries must match exactly. In fact, the second query uses an SQL expression to make the data type identical for the first column in the select list. Third, an Order By clause is required. It must use column numbers instead of column names. The columns must be for the primary key columns in the table, or at least cover a set of columns that guarantee uniqueness within the table.
asntdiff -f queries.asntdiff
The command writes several lines to stdout. The first few show parameter defaults. The next block echos the lines from our file. The last few say what asntidff found. The very last line is a message that says how many differences were found and where the list is stored. Your output should look very similar to mine shown here:
||2010-03-28-18.39.00.578000 ASN4010I "AsnTDiff" : "ASN" : "Initial" :
Number of differences found between the source and target table: "3".
The details can be found in database "SAMPLE", difference table "ASN"."ASNTDIFF".
You can now query the asn.asndtiff table. Assuming the default contents of the Sample database, your results should be the following:
That's it. You're now ready to do your own queries with asntdiff. However, I recommend you read the documentation to understand the options available.
An Example with SQL Replication
This uses the very simple replication set up found in A Fast Way to Get Started with DB2's SQL Replication
. If you don't have that set up yet, it won't take long ;--) Naturally, if you have your own SQL Replication or Q Replication configuration, you can follow along and modify the example using what you've defined.
Steps to follow:
1. Make sure SQL Capture and SQL Apply are running.
2. Stop making changes to the source table. (We will start with the tables in sync.)
3. If you just made changes, give SQL Apply a minute to make sure all changes are in the target.
4. Run the following command from a DB2 command window to see that no differences exist:
||asntdiff db=sample where="set_name = 'SET00' and target_table = 'TRGEMPLOYEE'
The lastmessage of output should say there are '0' differences. However, before we move on, let's look at this command for just a second.
The command needs to know which database contains your Apply control tables. Use the 'db' parameter for this. You also need to identify a single subscription member (the object that shows which target table goes with which source table). Do this through the 'where' parameter. This parameter is just what it sounds like - a list of predicates.
You may be thinking that this doesn't look very special when compared to writing your own query to look for differences. However, a number of things can be done by asntdiff that make your life simplier. For example, if you are using the techniques discussed in Transforming Data as You Replicate, asntdiff automatically applies those transformations as data is compared. Also, the command has several useful options.
5. From your DB2 window, connect to the sample database and delete a row with these two commands:
||db2 connect to sample
db2 delete from trgemployee where empno='200340'
6. Now, run the asntdiff command again from your DB2 window:
||asntdiff db=sample where="set_name = 'SET00' and target_table = 'TRGEMPLOYEE'
You should see messages similar to the following written to stdout:
||2010-03-30-126.96.36.1995000 ASN4006I "AsnTDiff" : "ASN" : "Initial" :
Between the source table and the target table, there are "41" common rows, "1" rows that are unique to the source table, and "0" rows that are unique to the target table.
2010-03-30-188.8.131.520000 ASN4010I "AsnTDiff" : "ASN" : "Initial" :
Number of differences found between the source and target table: "1".
The details can be found in database "SAMPLE", difference table ""ASN"."ASNTDIFF"". ".
7. If you query the asn.asntdiff table, your results should show one row as follows:
That's it. That's all it takes to compare the rows of source and target tables. So, what about 'repairing' the target based on this result? Let's do that next :--)
Repairing the Target Table
This will be short. To have our target rows match our source rows, use the following asntrep command:
||asntrep db=sample where="set_name = 'SET00' and target_table = 'TRGEMPLOYEE'
First, notice the command requires the same where clause that was used with asntdiff. The repair needs this so it knows which source and target tables are associated with the data in the asn.asntdiff table.
Next, look at the information written to stdout. asntrep shows all the SQL it issued. However, the last message of output is what you really need. It is a summary of repair actions. Your output should look similar to the following:
||2010-03-31-08.54.39.484000 ASN4019I "AsnTRep" : "ASN" : "Initial" :
The program applied the rows from the difference table to the target table as follows:
"1" rows were inserted, "0" rows were updated, and "0" rows were deleted.
You can now query the target to verify the deleted row has been copied from the source to the target. Or, you could run asntdiff to see that no differences are found.
You're done :-)