Blog: Optimizing your Redshift Storage with AWS Schema Conversion Tool

Amazon introduced the new feature called Redshift Optimization for the Schema Conversion Tool (SCT) November 17, 2016 release. You can now use AWS SCT to optimize your Amazon Redshift databases. For more information, see Optimizing Amazon Redshift by using the AWS Schema Conversion Tool. In the following video, we will demonstrate the essentials of using the Redshift Optimization to improve the query performance.

Make sure to check out our new AWS Schema Conversion Tool Jumpstart offer to get you up and running fast for your migration to Amazon Redshift.

Background on Redshift Optimization

When do you really need to launch the Redshift Optimization? Based on the experience of our Data Management Team, we came to conclusion that if the performance of the Redshift cluster satisfies you, there is no need to change something. However, if the performance of the cloud storage does not meet your requirements or expectations, you may try and launch the optimization of the Redshift data storage.

AWS SCT provides the recommendations on the selection of Distribution Keys and Style, as well as the best choice for the Sort Keys. Assuming that the existing data warehouse already implements some optimization strategies, you can compare new recommendations with the older ones and decide whether to use the new Distribution Keys and Sort Keys or not.

Basically, the Redshift optimization project can be considered as a regular AWS Schema Conversion Tool migration project with the source and target pointing to the Amazon Redshift clusters. Here you have to make the important decision: whether to use a copy of the source cluster as a target or start the optimization project from scratch.

Choosing the right strategy

Official Amazon documentation does not recommend using the same cluster for both source and target of your optimization project. Thus, before you begin the optimization process, you have to create a copy of the source cluster. We have shown the simplest way to perform this in the video above, and the essence of this method is about the creation of a source cluster snapshot followed by its restoration in another cluster.

Running the optimization project we can choose one of two migration strategies:

  • Migration to a copy
    This ensures that the target cluster includes the original data from the source cluster. Therefore, there is no need to worry about migrating the data into the optimization project.
  • Migration to a clean slate
    Choosing this option may be favorable when you just want to review the new optimization suggestions. Should you decide to accept them and use the newly created cluster, you will need to create the DMS task to migrate the data or use the Data Migration Agents.

Optimization Rules Selection

AWS Schema Conversion Tool provides the same list of rules for Amazon Redshift optimization as for migration projects for other platforms. Still, there are some specific rules used exclusively in Redshift.

Let us review them all. Bear in mind that you have to specify the weight of each rule in the determination of a Sort Key and a Distribution Key (read more about that in our previous blog post).

  • Choose Distribution Key and Sort Keys using metadata.
    Determines the optimization strategies based only on the metadata, not analyzing the data in the source data warehouse.
  • Choose the fact table and appropriate dimension for collocation.
    Decides whether the table is a fact table or a dimension table and determines the right distribution for them.
  • Analyze Data Distribution.
    A specific Redshift rule that analyzes how Redshift distributes data on slices and nodes. If SCT determines skewed data (one slice is filled and the other is empty), then it will suggest using another distribution key.
  • Find ‘small’ tables.
    Here you have to specify the size for small tables. These tables will get the “all” distribution style and will be copied to all nodes in the cluster.
  • Analyze the most expensive queries.
    Another specific Redshift rule that analyzes the history of executed queries. SCT determines the tables used in the queries, which require a lot of processing time and decrease the overall system’s performance. SCT suggests another Sort key for these tables.
  • Find the most used tables and columns from QueryLog table.
    Analyzes the history of executed queries. Determines the most frequently used queries, analyzes them and selects the columns to be used as a distribution key in order to increase the system’s performance.

Observations

The Optimization Redshift feature makes life easier for DBAs managing a Redshift cluster. From now on they can identify and fix performance issues using the SCT Redshift Optimization. The alternative way requires a huge lot of manual job. You have to seek the most expensive queries, analyze the query plan and statistics, then create error prone drop and recreate scripts. Using the SCT for automatic Redshift optimization is undoubtedly a great bargain!

Creating a copy of original cluster for truly big data warehouses may be extremely expensive. In this case, if you are just looking for the right optimization strategy, you may use the ‘Migration to a clean slate’ option, which will simply provide recommendations on choosing the Distribution and Sort keys.

Keep in mind that the Redshift Optimization process may continue for quite a long time. The basic operation of collecting statistics on the source data warehouse may take a couple of hours, so make sure you are using the optimization tool on a powerful PC.

Please check out our new AWS Schema Conversion Tool Jumpstart offer to get you up and running fast for your migration to Amazon Redshift.

Related AWS SCT Posts