Using data extraction agents in AWS SCT

07 Jun 2017 Andrey Ivanov Amazon Data Warehouse Solutions, Amazon Web Services, AWS Database Migration Service, Cloud Migration, Data Warehouse Migration, Database Migration
Tags: Amazon Redshift, AWS Schema Conversion Tool, data extraction

The Amazon Database Migration Service (DMS) does a great job in helping to copy data from your on-premises databases to AWS RDS for OLTP workloads. When you start talking about terabytes of data, the prior version of DMS was not up for the job. Amazon realized the shortcoming and developed an extension to the Schema Conversion Tool to create a performant way to migrate Terabytes of data from your on-premises Data Warehouse platforms to Amazon Redshift.

On May 11, 2017, Amazon released version 1.0.602 of AWS Schema Conversion Tool (SCT). With the help of data extraction agents that were introduced in this update, AWS Schema Conversion Tool now allows for uploading of the data to Redshift in just a couple of mouse clicks. In the following video, we will demonstrate how the automated data upload to Redshift works.

Check out our new AWS Schema Conversion Tool Jumpstart offer to get you up and running fast for your migration to Amazon Redshift.

Continue reading to learn more about using the Data Migration Agents for data extraction.

Typical Data Extraction Workflow

You need to take the following actions to complete the data migration.

Generate certificates.
AWS SCT can generate the Trust and Key stores or you can provide the path to the previously created certificates. These Trust and Key stores are required on the next step.
Install data extraction agents.
Installing the extractors, you will need to provide the Trust and Key stores, the database drivers and connection credentials. After installing the extractors, you need to register them in the AWS SCT.
Set up the migration project.
Create a new database migration project in SCT, connect to the source and target databases, and specify Amazon S3 credentials in the AWS Service Profile settings.
Convert schemas.
Select the tables you want to move to the cloud and convert them using the AWS Schema Conversion Tool. If errors don’t occur during conversion, you can proceed to apply the converted code to the target data warehouse.
Manage the extraction.
Create a data migration task and start it. Select the right data migration mode — you need to choose the ‘Extract, Upload and Copy’ option to fully automate the migration.
Verify data migration.
Traditionally, we use our in-house Database Compare Suite tool to compare the original and the migrated tables. Using the latest Fast Data Comparison operation allows for comparing big sets of data in just minutes rather than hours.

Background on Data Migration to Redshift

You can upload data to Amazon Redshift only from the Amazon Simple Storage Service (Amazon S3) bucket. So, during database migration, you should take care not only of extracting data from the source database but about uploading it to the target as well.

Actually, the previous versions of Schema Conversion Tool required a lot of manual work. The users had to extract data, store it on the local drive, then upload it to the Amazon S3 bucket and finally use the copy command to load the data to a Redshift cluster. We described this workflow in a corresponding article.

Thankfully, now Amazon has solved these problems and you can fully automate the data migration process by using the data extraction agents.

Installing Data Extraction Agents

You can find the installation files in the archive with the latest SCT update. To download the latest SCT update, press the Help button and select the ‘Check for updates’ option from the menu.

Amazon’s data extraction agents support the following database platforms:

Greenplum;
Microsoft SQL Server;
Netezza;
Oracle;
Teradata;
and Vertica.

Also, you can run the data extraction agents on the following operating systems: macOS, Microsoft Windows, Red Hat Enterprise Linux or Ubuntu Linux.

When installing the data extraction agents (we call them the extractors), you need to specify the Trust and Key Stores, that you use in SCT. That allows for seamless integration of the agents with your Schema Conversion Tool application. For extractors running on Windows, you need to install the Java Runtime Environment and the drivers for all of the supported database platforms. Also, you need to provide the source database connection parameters for the data extraction agent.

After successfully installing the extractors, you need to register them in SCT. To do so, switch to the Data Migration View and press the ‘Register’ button. Please note that if you want to use the same agent in another SCT project, you should unregister it in the current project by pressing the ‘Unregister’ button.

Access Credentials

Since we’re interacting with the source and target databases, data extractors and the Amazon S3 bucket, let’s see where the access credentials are stored.

Initially, you have to grant access to the source and target data warehouses in the SCT project. To manage the data extraction agents, you need the Trust & Key stores. You can generate them in the AWS Schema Conversion Tool and use them several times. These keys are mandatory during installation of the extractors.

In addition, you have to provide the credentials to connect to your AWS account and your Amazon S3 bucket in the AWS Service Profile settings. Later, you should select the correct profile in your current project (if you have 2 or more profiles stored).

Managing Data Extraction Agents

To start the data extraction, you need to select and convert the tables you want to migrate and then apply them to the target. Then select the ‘Create local task’ option from the pop-up menu in the left application panel to create the data migration task.

On the next step, we need to specify the migration mode. You can choose one of the following options: Extract only, Extract and Upload or Extract, Upload and Copy.

The first option allows you to run the agent and just extract data. You can use the extracted data for backup or upload it to the new data warehouse manually.

Choosing the second option, you can automatically upload the extracted data to the S3 bucket, but you will still need to run the Aginity Workbench to copy the data to the Amazon Redshift cluster.

The ultimate third option allows performing the full data upload cycle automatically. Just make sure that all the credentials are set correctly and then sit back and watch your data moving from your old database to the new cloud storage!

If you have any questions related to the database migration to Amazon Redshift, feel free to contact us.

Blog: Using data extraction agents to fulfill the database migration projects in AWS Schema Conversion Tool

Typical Data Extraction Workflow

Background on Data Migration to Redshift

Installing Data Extraction Agents

Access Credentials

Managing Data Extraction Agents

Related AWS SCT Posts

Related posts

Getting started