ChannelDB2

Distributing Data to a Large Number of DB2 Servers

Once you get past the simple case of copying changed data from one system to another, the next most common replication solution is data distribution. In this scenario, you distribute data from a central database to two or more databases, maybe even 100's, that are likely remote. An example is sending changes from a corporate headquarters to branch offices or retail locations. You may only need the data to flow one way - from a central database to others. Or, you may also need some data to flow back to the central site.

Whatever the case, I find you can learn general considerations by taking a look at a specific implementation. It may also help you think of some not-so-obvious issues with other implementations. For this post, I'm going to look at the IBM change data capture technology called SQL Replication. It's built into DB2 LUW (and any product that contains DB2 LUW such as InfoSphere Warehouse). It also has a simple scripting language that's great for examples. I've got a small one at the bottom of this post. It shows a fast way to set up an SQL Replication distribution.

Before We Begin

Let's take a quick, high-level look at how SQL Replication works...

Like any good change data capture technology, SQL Replication has a Capture program that gets changed data from the database transaction log. Those changes are saved in a "changed data" (CD) table in the source database. A separate Apply program retrieves the changes and uses SQL inserts, updates, and deletes to modify a target table. The Capture program can delete ('prune') CD table data that has been successfully replicated to the target.

Now, some people choke at this point :) They don't like the fact that their source system is going to experience the overhead of inserts and deletes on the CD table. While I agree that this impact can be too much for some scenarios, the CD table approach can have some distinct and useful advantages for data distribution. What's more, even if you have a source where CDs are a challenge, there's a distribution topology that includes SQL Replication and reduces the impact of data distribution on source systems. I'll discuss it in a future post.

Scaling a Distribution is Easy

SQL Replication takes advantage of three things databases do well - client connectivity, queries and multiuser support. The Apply program connects to the source as an application and uses SQL queries to retrieve data from CD tables. This means each new target you add is little more than a new query user. That's something most databases can handle many of easily.

Connectivity Set Up Is Simple

In a typical distribution, an Apply program runs on each target system. This means all you do for connectivity is make the target system a client of the source database. There's no need to establish connectivity between Capture and each Apply. Naturally, you can also take advantage of features provided by the database's client-server support such as client reroute and encryption between client and server.

You Can Filter, Subset, and Transform Data as It's Replicated

Because changed data is in tables, you have the full power of SQL to manipulate the data as it's replicated. I'm only going to discuss two ways here - subsetting rows, also known as row filtering, and column-level transformations.

A common requirement is to send a target only the subset of data it uses. For example, a sales team in Europe may not need data for customers in Asia and the Americas. This is handled by providing an SQL predicate to be used when Apply queries the CD table. The query results mean that Apply only replicates the rows that satisfy the predicate.

Another common requirement is to transform column data - substring, concatenate, cast to a different data type, whatever - to meet the needs of a target. SQL Replication can use SQL column expressions and scalar functions to do this. This is nontrivial. For example, by my count DB2 has well over 100 scalar functions. Another alternative is to subscribe to a view over the source table. Many people are unaware of this possibility since so few change data capture technologies offer view support. I'll give some examples in a future post.

You Only Need to Read the Log One Time

Once data is committed to CD tables, the Capture program never has to reread log records. For example, if target database is down for scheduled maintenance, changes are saved in CD tables until (1) the target is available or (2) a retention period is up (7 days by default). There's never a need to retrieve archived log files when the target returns. All you need to do is make sure the CD has space for changes captured during the maintenance window.

One Bad Apple Doesn't Spoil the Whole Bunch

In any distribution scenario, an extended outage of a single target is always possible. For example, one target is down several days due to a hardware problem. SQL Replication lets you bring the downed target back on line without impacting the other targets. They are not delayed as the downed target catches up. And, as already mentioned, there's never a need to retrieve older or archived log files so that the downed target can get caught up.

You Have Plenty of Statistics Available

I'm amazed how often people don't think to ask about how they can validate the long-term health and performance of a data replication product. By default, the SQL Replication programs maintain 7 days of historical information for the processing they've done. This is enormously helpful for determining peaks and valleys, the health of your replication configuration, and whether your meeting SLAs. This data is easily accessible via SQL since the data is all stored in tables. If you need help understanding any of it, try using a utility called the Analyzer.

You Can Easily Combine with HADR on the Source

Your source can be covered by HADR and your Apply programs will be switched between primary and backup if the appropriate client reroute information is defined. For more about using HADR and SQL Replication together, see the video on ChannelDB2.

--------------------------------------------------------------------

Sample Script

The rest of this post provides a sample script and instructions to set up an SQL Replication distribution with one source database and two targets.

Before you start, you need to understand two SQL Replication terms:

  • Subscription set. This a list of tables being replicated between the source database and one target database. You need one subscription set for each target database.
  • Subscription set member. This maps one source table to one target table. Subscription sets contain one member for each target table.

Follow these instructions to set up the sample distribution:

  1. Create the DB2 SAMPLE database if you haven't already.
  2. If you haven't enabled DB2 log archiving for the SAMPLE database, do so now from a DB2 command window:
    db2 update database configuration for sample using logarchmeth1 logretain
    db2 backup database sample
  3. In the same DB2 instance as the SAMPLE database, Create two additional databases called TARGET1 and TARGET2.
  4. Copy and paste the script from this post into a file named distribution.asnclp. The script is in the highlighted box at the end of this post.
  5. Edit the script and make the following changes:
    - Change all occurances of the 'DB2ADMIN' to the user you use to connect to your DB2 SAMPLE database.
    - Change all occurances of 'mypw' to your DB2 user's password.
    - If you are working from a UNIX system, change the table space directory from 'c:\' to a UNIX directory.
    - Save the file.
  6. Process the script by running the DB2 asnclp command in a DB2 command window.
    asnclp -f distribution.asnclp
    You may want to redirect the output to stdout.
  7. Open a DB2 command window and start the Capture program:
    asncap capture_server=sample
  8. Open another DB2 command window, and start the first Apply program:
    asnapply control_server=target1 apply_qual=apply1
  9. Open one more DB2 command window and start the second Apply program:
    asnapply control_server=target2 apply_qual=apply2

Your configuration is now running. If you want to verify data replication, do the following:

  1. Select from the target tables. They should now contain a copy of all data from the source table.
  2. Update a row in the Employee table.
    Any number of rows will do. So will an insert or delete.
  3. Query the target Employee tables to see your change.
    You may not see the change for up to a minute.
    By default, the Apply program sleeps one minute when it finds no data to replicate.

That's it, except for the script. Here it is:

# -----------------

# Initialize script
# -----------------
SET RUN SCRIPT NOW STOP ON SQL ERROR ON;

#
# Source database
#
# ------------------------------------------
# Initialize source database for replication
# ------------------------------------------
SET SERVER CAPTURE TO DB SAMPLE ID db2admin PASSWORD "mypw";
CREATE CONTROL TABLES FOR CAPTURE SERVER;

# ------------------------------------------------------
# Identify source table(s) and corresponding CD table(s)
# ------------------------------------------------------
CREATE REGISTRATION (db2admin.EMPLOYEE) DIFFERENTIAL REFRESH STAGE CDEMPLOYEE;

# ----------------------------------------
# Work with target databases one at a time
# ----------------------------------------
#
# Target database 1
#
# ------------------------------------------------
# Initialize first target database for replication
# ('Control' and 'Target' are usually the same)
# ------------------------------------------------
SET SERVER CONTROL TO DB TARGET1 ID db2admin PASSWORD "mypw";
SET SERVER TARGET TO DB TARGET1 ID db2admin PASSWORD "mypw";
CREATE CONTROL TABLES FOR APPLY CONTROL SERVER;

# ---------------------------------------------
# Group source and target mappings into a 'set'
# ---------------------------------------------
CREATE SUBSCRIPTION SET SETNAME SETA1 APPLYQUAL APPLY1 ACTIVATE YES
TIMING INTERVAL 1 START DATE "2008-01-01" TIME "01:00:00.000000";

# -----------------------------------------------------------------
# Identify table space characteristics when creating target tables.
# -----------------------------------------------------------------
SET PROFILE TARGET1TS FOR OBJECT TARGET TABLESPACE OPTIONS UW USING
FILE "c:\TARGET1.FILE" SIZE 700 PAGES;

# -------------------------------------------------
# A 'member' maps a source table to a target table.
# -------------------------------------------------
CREATE MEMBER IN SETNAME SETA1 APPLYQUAL APPLY1 ACTIVATE YES
SOURCE EMPLOYEE
TARGET NAME EMPLOYEE
DEFINITION IN EMPTS CREATE USING PROFILE TARGET1TS
TYPE USERCOPY COLS ALL REGISTERED;

#
# Target database 2
#
# -------------------------------------------------
# Initialize second target database for replication
# -------------------------------------------------
SET SERVER CONTROL TO DB TARGET2 ID db2admin PASSWORD "mypw";
SET SERVER TARGET TO DB TARGET2 ID db2admin PASSWORD "mypw";
CREATE CONTROL TABLES FOR APPLY CONTROL SERVER;

# ------------------------------------------------------------
# Identify set to contain members between source and target 2.
# ------------------------------------------------------------
CREATE SUBSCRIPTION SET SETNAME SETA2 APPLYQUAL APPLY2 ACTIVATE YES
TIMING INTERVAL 1 START DATE "2008-01-01" TIME "01:00:00.000000";

# -----------------------------------------------------------------
# Identify table space characteristics when creating target tables.
# -----------------------------------------------------------------
SET PROFILE TARGET2TS FOR OBJECT TARGET TABLESPACE OPTIONS UW USING
FILE "c:\TARGET2.FILE" SIZE 700 PAGES;

# -----------------------------------------------
# Map source table to a target table in Target 2.
# -----------------------------------------------
CREATE MEMBER IN SETNAME SETA2 APPLYQUAL APPLY2 ACTIVATE YES
SOURCE EMPLOYEE
TARGET NAME EMPLOYEE
DEFINITION IN EMPTS CREATE USING PROFILE TARGET2TS
TYPE USERCOPY COLS ALL REGISTERED;

# Now, run the file through the asnclp command
# and you are ready to replicate data.
QUIT;

Views: 559

Comment

You need to be a member of ChannelDB2 to add comments!

Join ChannelDB2

Featured Downloads

Try BLU Acceleration on Cloud

© 2017   Created by channeldb2.   Powered by

Badges  |  Report an Issue  |  Terms of Service