Blog: AWS Schema Conversion Tool: Specific Features of Migrating Data Warehouses to Redshift

This continues our video blog series on the AWS Schema Conversion Tool (SCT). In the previous blog post we’ve talked about the general approach of data warehouses migration to Amazon Redshift, and in this article, we are going to dig deeper into the DW migration specific features.

Before starting the migration of Greenplum, Netezza, Oracle or Teradata data warehouse to Amazon Redshift, you will want to understand the capabilities and limitations of Redshift.

This video will give you a better understanding of what you may encounter when migrating to Amazon Redshift.

Be sure to check out our new AWS Schema Conversion Tool Jumpstart offer to get you up and running fast for your migration to Amazon Redshift.

Requirements and limitations for DW Migration

  • User privileges for migration to Redshift. To complete the conversion process, the user requires the privileges to access schemas and database statistics. You can find the full list of user privileges here.
  • Maximum tables per cluster. The maximum number of tables per cluster in Redshift is 9900. The actual limit might be reduced because of certain system-defined temporary tables that are used for query processing. SCT doesn’t include views are in this limit. AWS Schema Conversion Tool converts source tables in alphabetical order. When the total number of tables in Redshift cluster reaches 9900, the tool stops working and the remaining tables get the corresponding Action Item.
  • Databases per cluster. The number of user-defined databases you can create in one cluster is 60. If the source data warehouse consists of more than 60 databases, the database from source system can be migrated to Redshift as the schema. Every database can include up to 256 schemas.
  • Key table limits. The maximum number of characters for a table name is 127. The maximum number of columns you can define in a single table is 1600. If the source table has more than 1600 columns, you will have to split it.
  • Sort columns. The maximum number of SORTKEY columns you can define in a single table is 400.

Unsupported Data Types

Usually, AWS SCT recommends converting the unsupported data types to VARCHAR with appropriate length. For example, INTERVAL DAY (p) TO SECOND (f) should be converted to VARCHAR (27) because the maximum number of symbols in that interval is 27.

If you use the BLOB/CLOB data types in the source DW, SCT cannot find an equivalent for those types in Redshift. In this case, we recommend to upload the LOBs in S3 and use the link to it in the table’s column. The default solution is to convert LOBs to VARCHAR and generate Action Item flag for each instance, so, the data manipulation could be performed later, during actual data migration process.

Conversion Issues

Converting objects, the most important thing to consider are the optimization parameters (such as sort keys and distribution keys). Read more about the DW optimization strategies in our upcoming blog posts.

At the same time, you have to keep in mind that the order of conversion of database objects can also influence the overall result. So, it’s up to you to decide, whether to convert the whole schema from scratch or use a step-by-step strategy and go for conversion of some selected objects at every single step of conversion. This may be the right solution for converting the schemas with a lot of dependent objects.

Also, some issues may occur during migration of the partitioned tables. At this stage, you have to consider the Redshift limitation, which allows creating not more than 9900 tables per cluster. Thus, if you convert the partitioned table as a set of tables for each partition, you can reach the limit unexpectedly. As well, you should keep in mind that Netezza source DW doesn’t support table partitioning (in the way it is implemented in the AWS Schema Conversion Tool).

Typical Action Items

As a part of the experiment, we have converted 4 similar data warehouses from Greenplum, Netezza, Oracle and Teradata to Amazon Redshift. The results of analysis of the generated actions item types look like this.

Action Items for Conversion of Different Sources of DW

As you can see, the Greenplum and Netezza data warehouses have the lowest number of critical action items during conversion to Amazon Redshift. That is no surprise because all these three database dialects are based on PostgreSQL. At the same time, Oracle data warehouses usually have the largest amount of issues during converting to Amazon Redshift.

Conclusion

So, let’s sum up the observations, that we’ve done while using the AWS SCT for data warehouse migration projects. We consider SCT as an essential tool to handle the routine automation of DW objects to Redshift. Also, the Schema Conversion Tool helps you administrating the migration process and allows checking which objects are already converted. With AWS SCT you will always be in control of DW migrations.

We strongly recommend AWS SCT to database analytics and database architects. You can download the latest version of AWS Schema Conversion Tool from http://docs.aws.amazon.com/SchemaConversionTool/latest/userguide/CHAP_SchemaConversionTool.Installing.html.

Related AWS SCT Posts