Blog: Extracting Data from Warehouses with Data Migration Agents in AWS Schema Conversion Tool

Migrating data warehouses to Amazon Redshift represents a serious challenges when you consider what it takes to transfer huge amounts of data in parallel. On February 16, 2017, Amazon released version 1.0.600 of AWS Schema Conversion Tool (SCT). This release brings the support of Data Migration Agents. Now you can effectively extract data from Oracle and Teradata data warehouses and prepare it for use with Amazon Redshift.

In the following video, we will demonstrate the typical workflow of using the Data Migration Agents to extract data and upload it to Amazon Redshift using the S3 Bucket as a data loading zone.

Make sure to check out our new AWS Schema Conversion Tool Jumpstart offer to get you up and running fast for your migration to Amazon Redshift.

Continue reading to learn more about using the Data Migration Agents for data extraction.

Background on Data Migration Agents

The previous version of AWS Schema Conversion Tool introduced tight integration with the Amazon Data Migration Services (DMS) for moving data from on-premises to RDS. By creating DMS tasks and endpoints, you can migrate the data from the source database to the target database directly in the SCT application window.

However, the Data Migration Services can’t provide enough throughput to extract the data from warehouses. So, you may have noticed that Amazon doesn’t support using data warehouses as the source database for DMS. Now you can use specific Data Migration Agents to extract data from your on-premises data warehouse and load it into Amazon Redshift using parallel execution for great performance.

AWS SCT Data Migration Agents

You can find the folder containing the Data Migration Agents in the install pack of the 600-th SCT release. You may run these agents under the Linux OS (Ubuntu or Fedora) on the remote PC while using AWS SCT to register and manage them. This approach allows you to extract data from the warehouses remotely, avoiding the overload of your PC and your Internet connection.

Typical data extraction workflow

We found out that using Data Migration Agents is not that easy task as it seems to be. The typical workflow on data extraction from the Oracle and Teradata warehouses includes some non-evident steps. Also, some problems may occur when you try to upload the extracted data to the Amazon Redshift. We would like to share with you our experiences on how to effectively take advantage of the Data Migration Agents.

  1. Generate Certificates.
    First, you have to generate the trust and key store certificates. You will need them on your Linux PC to install the data migration agents correctly.
  2. Install Data Migration Agents.
    You will definitely need some experience in using the Linux command line to install and configure the data extractors. Bear in mind that these agents don’t have any usual graphic user interface.
  3. Register the Extractor.
    After successfully installing and running the data migration agents under the Linux OS, you will have to get back to the AWS SCT on a Windows PC. Here you have to register the data extractors.
  4. Extract Data.
    Finally, we can create the data extraction tasks and start them. Now we can sit back and relax, watching the data extraction progress.
  5. Upload Data to Redshift.
    You may have already stumbled across that one can upload data to Redshift only from S3 bucket. Thus, you ought to upload the extracted data to the S3 bucket first. Then you have to use the scripts, automatically generated by the data extractors to upload the data from the S3 bucket to the Amazon Redshift.

Best practices and suggestions

Amazon recommends installing multiple agents on computers close to your warehouse to improve performance. Saying multiple we don’t mean a lot of agents: using too many data extractors may cause connection delays. Also, Amazon suggests running data extraction from extremely large tables as the separate agent tasks to avoid connectivity problems.

After finishing the upload of the data to the Amazon Redshift cloud, you need to check whether all data migrated successfully or not. You con do this manually by performing a number of SELECT statements in the source and target databases. We suggest to automate this process with the help of Database Compare Suite. Its special Fast Data Comparison feature can compare tables with millions of rows in just minutes, not hours or days. Connect to the source and target databases, select the object you want to compare on the left panel, press the Align objects button to highlight it in the right panel, and then press the Fast Data Compare button from the main menu to compare the results.

Observations

The Data Extraction feature becomes really valuable when it comes to data warehouse migration projects. We used the data migration agents in some database migration projects and this is what we found out.

  • You may use 1 data migration agent only in 1 data migration project. If you want to use the same agent in another SCT project, you will have to unregister it from the old project first and then register it in the new one.
  • You may pause the data extraction task by pressing the Stop button. If you decide to continue later, the data extraction process will either start downloading from the very beginning or continue from where the download process was stopped if the table includes the primary key (Oracle) or the primary index (Teradata).
  • When you use a number of data extraction agents it may be difficult to find the data that they extracted. So, you may check the agent’s ID in the AWS Schema Conversion Tool — the data extracted by this agent will be stored in the folder with the same name. Inside it you will find the folder with the task ID. Moreover, for every subtask, the extractor will create a folder with named after the subtask ID inside the original task folder.

Please check out our new AWS Schema Conversion Tool Jumpstart offer to get you up and running fast for your migration to Amazon Redshift.

Related AWS SCT Posts