Data federation (or virtualization) is a data integration technology that lets you build what some people think of as a database cloud. It's found in DB2, InfoSphere Warehouse, and a few other IBM products. It can be extended or used stand-alone through IBM's InfoSphere Federation Server.
However, even with it available many different ways, I regularly meet people who know little about it and are under the impression that it's difficult to learn. It's not :) That got me thinking about the fastest way to get started with it. Personally, I think it's best to set up a simple scenario and go from there. That's what I'm going to show in this post. It should take about 15 minutes max. You can then extend your scenario using the documentation in DB2's Information Center
or other posts I have.
I have instructions and a single SQL script that shows you the basics. You need a DB2 instance before you start. The instructions have you create two databases under your instance and then run the script. The script creates federation objects and provides an example of what's called a "federated query" or "distributed join". I'll explain more about those later in the post.
After you finish, if you want to spend a few more minutes growing your new-found skills :) see one of the following:
Your fast start is the seven steps in the next section. Note that you can do all of this with DB2 Express-C.
Do It Yourself
1. Enable federation in your DB2 instance by issuing the following command from a DB2 command window:
db2 update database manager configuration using federated yes
2. If your DB2 instance was running you must stop it and then start it again.
3. Create the DB2 Sample database if you haven't already. From a DB2 command window, issue:
4. Create an empty database (one with no user tables). From a DB2 command window, issue:
db2 create database clouddb
5. Copy and paste the script at the bottom of this post into a file named, say, mycloud.sql
6. Edit the file and make the changes described in the script's opening comments.
7. Run your SQL script by issuing the following command from a DB2 command window:
db2 -vtf mycloud.sql > mycloud.txt
That's it. You've created a federated database (clouddb) and federated it with two tables from another database (sample). You've also joined those tables in a single query. You can now try some other joins, federate more tables from sample, or federate and join tables from yet another database.
However, if you're very new to federation, you may want to understand a two things from the script - what a nickname is and what a distributed join is.
What is a Nickname?
You may have noticed the script contains two "Create Nickname" statements. A nickname is a 'virtual' table in one database that points to a physical table in another database. The nickname does not contain data itself. It just has information about the physical table. You can use a nickname in an SQL statement anywhere a table name can be used. Under the covers, the federated database server issues what ever SQL is necessary to work with the physical table.
For example, in the following graphic, a select is issued to the federated database. A nickname is specified in the from-clause. This is the same thing shown in the script's first query. The federated database server retrieves just enough data from the physical table to satisy the query. The result set is returned just as though the data were stored in the federated database.
You can also use nicknames in inserts, updates, deletes, predicates, and more. However, one of the most interesting uses is in a distributed join.
What Is a Distributed Join?
A distributed join is one that references tables from multiple databases. More specifically, the query uses at least one nickname. For example, in the following graphic, a join is issued to the federated database. One of the referenced tables resides locally in the federated databse. The other table resides in another database and is pointed to by a nickname in the federated database. This is the same thing shown in the script's second query.
When the federated database server processes the join, its optimizer determines how to retrieve the least amount of data possible to perform the join.
There's more you can learn about federation, but I think that would take us well past a fast start :)
-- Before using this script, make the following changes:
-- 1. Replace all occurances of 'tolleson' with a user name
-- you can use to connect to your DB2 databases.
-- 2. Replace all occurances of 'mypassw0rd' with the
-- password that goes the the user name you provided.
connect to clouddb;
-- First, let's create one table in the federated database.
drop table greeting;
create table greeting (deptno char(3), dept_greeting char(6));
insert into greeting values('E21', 'Howdy');
-- Now, let's create our federated objects
drop wrapper drda;
create wrapper drda;
-- The add dbname option is a DB2 alias from the database
-- directory where the federated server runs.
create server mysample
add dbname 'SAMPLE');
-- The 'remote' authid and password are the user and password
-- the federated server uses to connect to the server
-- 'mysample' when processing SQL for the user 'tolleson'.
create user mapping for tolleson
-- These nicknames point to tables in the Sample database.
create nickname myemp for mysample.tolleson.employee;
create nickname mydept for mysample.tolleson.department;
-- Query 1: Select from just one nickname.
select ee.empno, ee.firstnme, ee.workdept
from myemp ee
order by ee.workdept;
-- Query 2: Join our local table with a nickname
select gg.dept_greeting, ee.firstnme
from greeting gg, myemp ee
order by ee.lastname;
-- Query 3: Join two nicknames
select ee.empno, ee.lastname, ee.firstnme, dd.deptname
from myemp ee, mydept dd
where dd.deptno='E21' and ee.workdept=dd.deptno
order by ee.lastname;