Blog: Using Change Data Capture for Tables Replication

A major bank in Europe that we’ve been working with over the years wanted to get help in upgrading SQL Server 2008 to 2016. Since we performed the migration process originally from DB2 to SQL Server we had a deep knowledge of the customer’s data.

Because of the need to have a highly available system and they could not afford to be down, they needed a way to downgrade to the old database system in case somthing bad happens to the upgraded system. Also, our customer wanted to include the disaster recovery solution. We proposed a solution that would allow them to use data replication from the upgraded database to the old database. So, in case of an issue in production they could easily go back to the old version of SQL Server.

To see how we were able to accomplish this check out our video.

Here’s the Problem Our Customer Had

SQL Server has a variety of technologies used to do data replication from one database to another. The most popular solutions include Database Mirroring, Log Shipping, Database Sharding, Export — Import, Backup — Restore, and so on…

SQL Server provides the built-in utilities for data replication from one database to another. However, you can implement this technique only if the tables in the databases have the primary keys. Unfortunately, most of the customer’s tables didn’t include the primary keys. Besides, the customer refused to change the database architecture in order to add the primary keys to the tables. Thus, we had to start searching for another technology to deliver the data from the upgraded SQL Server database to the old one.

Most of the other replication technologies work only if both databases are running on the same version of the SQL Server. In this particular case, we needed to replicate data from SQL Server 2016 database to the old customer’s SQL Server 2008 database instance.

We have managed to find an alternative option that allows tracking changes in the source database. We mean using the Change Data Capture (CDC) technology.

Technical Solution

It turned out CDC was the only available solution for data replication of this particular customer’s system. Actually, we already used the similar approach when we migrated IBM DB2 to SQL Server for this customer.

Let’s discover how the Change Data Capture technology works. CDC scans the replication log and harvests changes in the tables of the original database. Then, for every replicated table CDC creates an auxiliary change table that stores all insert, update, and delete activity applied to the original SQL Server table.

Here’s the approach that we took:

  • enabled Change Data Capture for the database.
  • enabled CDC for the selected tables and
  • specified the column that need to be replicated.

change data capture replication

So, now we’ve got the changes tracked, yet we still had to deliver them to the replicated table in the SQL Server 2008 environment. To automate the process of data delivery, we created the PowerShell Script. This script withdraws changes from the auxiliary change table and performs exactly the same operations in the replicated database. Thus, now we have two identical copies of data in source and target databases.

Change Data Capture Setup

Let’s discover how one can perform the setup of the Change Data Capture.

Creating objects

First of all, we create the source and target databases.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
IF db_id('CDC_Source') IS NOT NULL
BEGIN
ALTER DATABASE CDC_Source SET single_user WITH ROLLBACK immediate
DROP DATABASE CDC_Source
END
CREATE DATABASE CDC_Source
ALTER DATABASE CDC_Source SET recovery FULL
GO

IF db_id('CDC_Target') IS NOT NULL
BEGIN
ALTER DATABASE CDC_Target SET single_user WITH ROLLBACK immediate
DROP DATABASE CDC_Target
END
CREATE DATABASE CDC_Target
GO

Then we create tables without the primary or unique key in both databases.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
USE CDC_Source
GO
IF object_id('dbo.TestTable1') IS NOT NULL
DROP TABLE dbo.TestTable1
CREATE TABLE dbo.TestTable1
(
id BIGINT
,val VARCHAR(500)
,dt datetime2
)

USE CDC_Target
GO
IF object_id('dbo.TestTable1') IS NOT NULL
DROP TABLE dbo.TestTable1
CREATE TABLE dbo.TestTable1
(
id BIGINT
,val VARCHAR(500)
,dt datetime2
)

Creating an auxiliary table for CDC

On the next step, we create an auxiliary table for delivered data.

1
2
3
4
5
6
7
8
9
IF object_id('dbo.cdc_lsns') IS NOT NULL
DROP TABLE dbo.cdc_lsns
CREATE TABLE dbo.cdc_lsns
(
seqval BINARY(10)
,obj_name VARCHAR(200)
,ts datetime2 DEFAULT sysdatetime()
)
INSERT INTO dbo.cdc_lsns DEFAULT VALUES

Enabling CDC

Now we enable the Change Database Capture on the database level and create the CDC role.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
USE CDC_Source
GO
IF NOT EXISTS(SELECT name FROM sys.databases
WHERE is_cdc_enabled = 1 AND database_id = db_id())
BEGIN
EXEC sys.sp_cdc_enable_db

EXEC sys.sp_cdc_add_job
@job_type = N'capture'
,@start_job = 1
,@maxtrans = 10000
,@maxscans = 100
,@continuous = 1
,@pollinginterval = 2

EXEC sys.sp_cdc_add_job
@job_type = N'cleanup'
,@start_job = 1
,@retention = 52494800
,@threshold = 10000
END

IF NOT EXISTS (SELECT 1 FROM sys.database_principals
WHERE name=N'cdc_admin' AND TYPE = N'R')
CREATE ROLE cdc_admin AUTHORIZATION dbo

After that, we enable CDC on the table level.

1
2
3
4
5
6
7
EXEC sys.sp_cdc_enable_table
@source_schema = 'dbo', @source_name = 'TestTable1',
@role_name = N'cdc_admin', @index_name = NULL
--2016 modification: no support for net changes required
,@supports_net_changes = 0
,@filegroup_name = NULL
,@allow_partition_switch = 1

After tweaking the source table, we can check the captured changes in the auxiliary table using the following statement.

1
SELECT * FROM CDC_Target.dbo.cdc_lsns

In addition, you can find more code samples, including the replication operations on this page.

Observations

Here are some valuable observations that we’ve learned from this project.

  • Before enabling Change Data Capture at the table level, make sure you have already enabled SQL Server Agent.
  • Consider changing settings in order to remove the history data from the auxiliary change table. It can quickly grow in size and decrease the overall performance of the replication script.
  • Definitely, you should use primary keys in the tables and therefore avoid these difficulties.

Benefits and Results

The customer got the upgraded SQL Server database system that now meets the strict security and high availability requirements and fits the highest industry standards. In addition, the customer was able to maintain sufficient database performance increase, real-time backup and the support of downgrade. As of this point in time, none of the databases had to be downgraded because of the superior performance of SQL Server 2016.