While preparing for what's likely to be coming my way in 2010, I thought about the topics I spent the most time on in 2009. I eventually wound up with a list of 10 that I felt were the most popular with the sales teams and customers I worked with. I thought I had a pretty good list :) so I emailed it to my collegues to get their impression. Naturally, everyone had at least a slightly different list and, when there was overlap, everyone wanted a slightly different order. Go figure.
So... what you see here is based solely on my own experience. No way I'm going to try to reconcile opinions and come up with a composite list :) I'm just going to start at the top of mine and work down:
1. High Availability
A popular replication topic for several years now. The focus in 2009 shifted even further towards a multiple active site configuration called "active-active."
Used to be, most people just wanted to shift query and reporting workloads from primary production systems to a secondary system. The secondary might be used by production apps in case of an outage of the primary. However, now, many more people want both sites to be 'primary' so they can balance all workloads across systems and have modified data kept in sync with minimal latency. They don't want to be restricted by distance, hardware, operating systems, etc..
The solution for DB2 z/OS, DB2 LUW, and InfoSphere Warehouse is InfoSphere Replication Server's Q Replication.
The highly parallel apply of data (more later), conflict detection and logging, monitoring (more later), and integration with DB2 are key reasons people picked it in 2009.
2. The Redesigned Q Replication Dashboard
The first release in 2005 was a huge hit. It combined live graphical monitoring of replication with reporting of replication performance, conflict data, and errors. It made Q Replication highly accessible, especially compared to the near black box experience you get from some replication and change data capture technologies.
However, that first Dashboard had two challenges - (1) it was a desktop application and (2) even more data was available than was being presented. The first issue was resolved by making the Dashboard a web app. Users no longer needed to install and maintain desktop software on any system they might need to work from. Unfortunately, the underlying technologies didn't make it as easy to extend as we'd hoped.
Enter the 9.7 Q Replication Dashboard
. Not only does it look and feel like the latest web UIs, but the underlying technologies have made it easy to provide access to all replication health and statistics (monitor data).
3. Log-Capture for DB2 LUW Compressed Tables
The use of compressed tables has grown to the point that they're everywhere. As a result, DB2 LUW added support of compressed tables to its log-read API in 9.7. This was immediately supported by the SQL Capture built into DB2 LUW and InfoSphere Warehouse. The announcements of InfoSphere Replication Server 9.7
and the 9.7 IBM Homogeneous Replication Feature
added the same support for the the Q Capture built into DB2 LUW and InfoSphere Warehouse.
People came to us with requirements to replicate anywhere from a few tables to 1000s with subsecond latency while sometimes doing so over 1000s of kilometers. Naturally, the requirements often included the possibility of handling a high volume of changed data. Q Replication was the answer we gave.
Among other things, Q Replication's highly parallel apply process avoids a major bottleneck found in old-style replication technologies and gives Q Replication a significant advantage in delivering both high volume and low latency.
(Think about it. Most multi-user applications generate transactions that are committed in parallel. However, many data replication technologies serialize the applying of transactions into target databases. That can be a significant bottleneck.)
5. The ability to run Q Capture remote from a DB2 LUW source
This was popular for two reasons. First, some customers were looking for more ways to reduce the CPU load on source systems. Second, some were looking for ways to reduce people costs by installing, maintaining, and running replication on a single system.
Q Replication can help by letting you run Q Capture remote
from a DB2 LUW source. This doesn't require copying log files or any other non-standard process. That's because the DB2 log-read API is a client API. The only limitation is that no code page or Big-Little Endian translation are done by the API. That means Q Capture must be run on a system with the same 'Endianness' as the source and in the same code page.
I generally recommend people first try running Q Capture on the target system with Q Apply. Performance should be fine. You can also run on third system with both Q Capture remote from the source and Q Apply remote from the target. However, I don't recommend a remote Q Apply for high-volume or low-latency scenarios since Inserts, Updates, and Deletes are applied to the target individually across the network.
6. The ability to integrate with DB2 LUW's HADR
HADR does a great job delivering value to DB2 customers. However, what do you use if you want to offload reporting or ETL from your HADR'ed system? Or, what if you have two sites that use HADR and you want to synchronize data between them?
For the first question, either SQL or Q Replication do the job. People often chose SQL Replication if cost was their primary concern (SQL Replication is included at no additional charge with the purchase of DB2 and InfoSphere Warehouse). It's also super easy to use with HADR since its metadata is all stored in DB2 tables. Q Replication was chosen by customers who wanted to minimize the impact on the HADR'ed systems or were looking to maximize throughput (volume) or minimize latency.
Q Replication was always the answer to the second question. This for the reasons discussed under High Availability and Performance headings.
7. SQL Replication
This is an IBM change data capture technology that's been around in one form or another since the early 1990's. It's built-into DB2 LUW and InfoSphere Warehouse and is ideally suited for large scale distribution of data (think 100's of targets). Database people find it extremely easy to learn and use since it's metadata is stored in database tables. Of course, the no-additional-cost to DB2 LUW and IS Warehouse customers doesn't hurt either :)
8. Statistical (monitor) data for replication
From my perspective, this is one of the most unsung heroes in the data replication world. Many people go through a product trial or proof of concept without asking what data is available to help them tune, monitor, and manage what they're evaluating.
For example, what kinds of data are available? How do I verify when my peak changed data volumes happen? How do I know if latency is consistently meeting requirement defined in my organization's Service Level Agreements (SLAs)? How do I access this data or provide reports on it?
You could wind up being sorely disappointed once you go into production. However, with IBM's Q and SQL Replication, you have access to a wide variety of historical statistical data about (1) replication performance and (2) workloads processed. It's easily retrieved via queries since it's stored in relational tables. For example, see this Q Apply table
. Tables also makes it easy to integrate with in-house monitoring and reporting software.
By default, data is generated once a minute (one every five minutes in older releases) and stored for seven days. Of course, you can change these values to meet your needs.
9. Q Capture for Oracle databases
With DB2 LUW 9.7's Oracle compatibility enhancements comes an increased need to replicate data from Oracle into DB2. There's no better way to get started than to combine a Q Capture for Oracle with the Q Replication built into DB2 and InfoSphere Warehouse.
Additionally, existing Q Replication customers can now satisfy their need to bring Oracle data into their DB2s without having to install and learn a different technology or re-implement infrastructure such as in-house monitoring and reporting.
10. The replication command line processor
A lot of people use command-line processors (CLPs) with their database servers. Among other things, it makes for easy scripting of DDL for either backup or replay on another system.
Well, IBM also has a command-line processor for Q and SQL Replication 'DDL' such as creating subscriptions. It's called asnclp
. It's gotten popular for some of the same reasons database CLPs are popular - you can set up an entire replication topology through scripts and then save the scripts for backup. Or, modify and replay them for another set of systems.
Also, as some of you know, I use asnclp to show how easy it is to replicate data