Blog: Using data extraction filters for smart database migration with AWS Schema Conversion Tool

Amazon introduced the data extraction agents in the previous release of AWS Schema Conversion Tool (SCT). They allowed for migrating all the data from your on-premises data warehouse platforms to Amazon Redshift. Wait, what if you don’t need to move all those terabytes of data, and just want to settle for the data generated in the past couple of years? Starting from the version 1.0.603, SCT allows filtering the data before uploading it to Redshift. In addition, the latest SCT version brings support to rather outdated Teradata 13 and Oracle Data Warehouse 10 schemas.

In the following video, we will demonstrate how you can use the data extraction filters in the AWS Schema Conversion Tool.

 

Check out our new AWS Schema Conversion Tool Jumpstart offer to get you up and running fast for your migration to Amazon Redshift.

Continue reading to learn more about using the data extraction filters.

Background on using the data extractors

If you are not familiar with the data extraction agents (or the extractors) work in SCT, they extract the data from the source database, and SCT then uploads it to S3 Bucket and copies to Redshift. Users like us found that there was no way to filter the data from the source – it was all or nothing for the selected tables.

In order to workaround this limitation, you had to move all the data to Redshift and then remove the unnecessary part of it. The other option includes creating a temporary table in the source database, selecting the required data into it and then moving it to the target. Both of these methods result in a loss of time.

Amazon decided to provide an option for filtering the extracted data. This is the most performant and effective way of moving the relevant data from source to target data warehouse because SCT filters the data during the extraction step and you don’t need to waste time uploading large amounts of data to Redshift. Here is what the new workflow of the extraction process looks like.

SCT data extraction filters schema

Creating data extraction filters

To create a data extraction filter, you need to go to the Mapping Rules dialog box from the Settings menu. This window displays the list of transformation and filtering rules already applied in the SCT application. Focus your attention on the bottom part of the window and create a new filtering rule here.

After you press the Add new filter button, it’s necessary to specify its name, as well as the schemas and tables to which the filter will be applied. You can provide the exact schema or table name or use a pattern to select multiple objects. SCT uses a LIKE clause to filter the schemas and tables.

Afterwards, specify the condition according to which the data will be filtered. You need to provide the filtering condition of the where clause. SCT will use it to filter the data on the extraction step. Don’t forget to press the Save button to add the newly created filter to the list.

SCT data extraction filters

You can add multiple filters. SCT will merge them in the unified filtering where clause to select exactly the required set of data.

Blazing fast data extraction

After successfully creating the filters, the data extraction workflow will look pretty much the same as described in our previous blog post. You need to register the extractors, convert the source schema and apply it to the target instance.

Then you should select the objects to migrate from source to target in the left panel of the AWS SCT project. Please note that you can specify the source schemas and tables to migrate on this step as well as filter them when creating the filter.

Start the data extraction, select the migration mode, and you’ll see the progress bar. SCT displays up to three progress bars reflecting the state of extraction, uploading and copying processes.

In our video, we have demonstrated the usage of the filters that select around 6% of the source data rows. SCT finished the task in less than 100 seconds. Migrating the whole table lasts 5 times longer and may require additional cloud resources. So, with the data extraction filters in AWS Schema Conversion Tool you may perform really smart cloud database migrations.

If you have any questions related to the database migration to Amazon Redshift, feel free to contact us.

Related AWS SCT Posts