Tags: AWS Schema Conversion Tool, Migration to Redshift
When migrating to Amazon Redshift, you need a way to convert functions and other capabilities that Oracle, Teradata, Greenplum, Netezza support that Redshift does not natively support. The AWS Schema Conversion Tool Extension Pack for Data Warehouses provides Python based libraries to provide the same functionally with Redshift. In this post, we will show how the AWS Schema Conversion Tool (SCT) generates the extension packs to help in your migration effort.
You will have to upload the extension pack for your source database which includes the functions from the user-defined Python libraries to S3 storage. In the following video, we demonstrate the typical workflow of applying the extension pack to the Amazon Redshift data warehouse.
Make sure to check out our new AWS Schema Conversion Tool Jumpstart offer to get you up and running fast for your migration to Amazon Redshift.
Continue reading to learn more about the possible issues.
Background on the Extension Pack for DW
As you may already know, SCT generates the extension pack to emulate the behavior of some source database functions in the target DB instance. The key difference of the extension pack for data warehouses lies in the additional Python functions that you may use in the converted code. You can upload this Python library to the target data warehouse alongside with the extension pack schema and use the external Python functions when needed.
But wait, you may remember that you can upload data files to Amazon Redshift only from S3 Bucket. Thus, you have to upload the Python library to the S3 Bucket first.
You can create custom user-defined functions based on the Python programming language. Amazon Redshift supports Python starting from version 2.7.
Use the following script to create a Python library:
1 2 3 4 5 6 7 | CREATE LIBRARY library_name LANGUAGE plpythonu FROM { 'https://file_url' | 's3://bucketname/file_name' [ WITH ] CREDENTIALS [AS] 'aws-auth-args' [ REGION [AS] 'aws_region'] } |
After creating the Python library, you have to add it to the archive. Then you have to upload this archive to Amazon Redshift via S3 Bucket. So, now you can get access to the function using the following command: from <name of Python file inside the zip-archive> import <function_name>.
Discovering the Extension Pack
For all types of data warehouses SCT emulates the versions table that doesn’t have a direct analog in Amazon Redshift. Also, SCT emulates some key functions, specific for various source instances. Let’s discover what’s inside in these extension packs.
Oracle Extension Pack
SCT emulates the following database objects in the aws_oracle_ext extension pack schema.
Object type | Object name |
Function | BIN_TO_NUM, HEXTORAW, INSTR, RAWTOHEX, RAWTONHEX, REGEXP_COUNT, REGEXP_INSTR, REGEXP_REPLACE, REGEXP_SUBSTR, ROUND, SOUNDEX, TRUNC |
Python function | ADD_MONTHS, BIN_TO_NUM, CAST_2C, CAST_2DC, CAST_2F, CAST_2I, CAST_2TS, CAST_2V, CURRENT_DATE, CURRENT_TIMESTAMP, EXTRACT, HEXTORAW, INSTR, LAST_DAY, LPAD, MONTHS_BETWEEN, NANVL, NEXT_DAY, NULLIF, NVL, NVL2, RAWTOHEX, RAWTONHEX, REGEXP_COUNT, REGEXP_INSTR, REGEXP_REPLACE, REGEXP_SUBSTR, REMAINDER, ROUND, RPAD, SIGN, SOUNDEX, SUBSTR, TO_CHAR, TO_DATE, TO_NUMBER, TRANSLATE, TRIM, TRUNC |
Teradata Extension Pack
SCT emulates the following database objects in the aws_teradata_ext extension pack schema.
Object type | Object name |
Function | bcl_monthnumber_of_calendar, bcl_monthnumber_of_quarter, bcl_monthnumber_of_year, bcl_quarternumber_of_calendar, bcl_quarternumber_of_year, bcl_weeknumber_of_calendar, bcl_weeknumber_of_year, bcl_yearnumber_of_calendar, cast_c2c, cast_c2f, cast_c2i, cast_c2v, day_of_month, day_of_week, day_of_year, daynumber_of_calendar, daynumber_of_month, daynumber_of_week, daynumber_of_year, dayoccurrence_of_month, instr, month_of_calendar, month_of_quarter, month_of_year, monthnumber_of_calendar, monthnumber_of_quarter, monthnumber_of_year, next, ngram, nvp, overlaps, prior, quarter_of_calendar, quarter_of_year, quarternumber_of_calendar, quarternumber_of_year, round, soundex, td_friday, td_monday, td_month_begin, td_month_end, td_quarter_begin, td_quarter_end, td_saturday, td_sunday, td_thursday, td_tuesday, td_wednesday, td_week_begin, td_week_end, td_year_begin, td_year_end, trunc, week_of_calendar, week_of_month, week_of_year, weekday_of_month, weeknumber_of_calendar, weeknumber_of_month, weeknumber_of_quarter, weeknumber_of_year, year_of_calendar, yearnumber_of_calendar |
Python function | add_month, cast_c2c, cast_c2d, cast_c2dc, cast_c2ts, cast_c2v, cast_d2c, cast_d2cf, cast_d2nf, cast_d2ni, cast_d2ts, cast_d2v, cast_d2vf, cast_n2c, cast_n2cf, cast_n2d, cast_n2dc, cast_n2f, cast_n2i, cast_n2v, cast_n2vf, cast_ts2c, cast_ts2cf, cast_ts2d, cast_ts2dz, cast_ts2v, cast_ts2vf, current_date, current_timestamp, day_of_month, day_of_week, day_of_year, DayNumber_Of_Calendar, DayNumber_Of_Month, DayNumber_Of_Week, DayNumber_Of_Year, DayOccurrence_Of_Month, extract, instr, last_day, lpad, month_of_calendar, month_of_quarter, month_of_year, MonthNumber_Of_Calendar, MonthNumber_Of_Quarter, MonthNumber_Of_Year, next, next_day, ngram, nullif, nvl, nvl2, oadd_month, otranslate, overlaps, prior, quarter_of_calendar, quarter_of_year, QuarterNumber_Of_Calendar, QuarterNumber_Of_Year, regexp_instr, regexp_replace, regexp_substr, round, rpad, soundex, TD_FRIDAY, TD_MONDAY, TD_MONTH_BEGIN, TD_MONTH_END, TD_QUARTER_BEGIN, TD_QUARTER_END, TD_SATURDAY, TD_SUNDAY, TD_THURSDAY, TD_TOESDAY, TD_WEDNESDAY, TD_WEEK_BEGIN, TD_WEEK_END, TD_YEAR_BEGIN, TD_YEAR_END, to_char, to_date, to_number, trunc, week_of_calendar, week_of_month, week_of_year, weekday_of_month, WeekNumber_Of_Calendar, WeekNumber_Of_Month, WeekNumber_Of_Quarter, WeekNumber_Of_Year, year_of_calendar, YearNumber_Of_Calendar |
Greenplum Extension Pack
SCT emulates the following database objects in the aws_greenplum_ext extension pack schema.
Object type | Object name |
Function | DATE_PART, DATE_TRUNC, OVERLAY, REGEXP_REPLACE |
Python function | ASCII, BIT_LENGTH, BTRIM, CBRT, COT, CURRENT_DATE, CURRENT_TIME, CURRENT_TIMESTAMP, DATE_PART, DATE_TRUNC, GET_BYTE, LOCALTIME, LOCALTIMESTAMP, LPAD, LTRIM, MD5, NOW, nullif, OCTET_LENGTH, OVERLAY, POSITION, REGEXP_MATCHES, REGEXP_REPLACE, REGEXP_SPLIT_TO_ARRAY, REGEXP_SPLIT_TO_TABLE, RPAD, RTRIM, SET_BYTE, SET_BYTE_STRING, SPLIT_PART, STATEMENT_TIMESTAMP, STRPOS, SUBSTR, SUBSTRING, TIMEOFDAY, TO_CHAR, TO_DATE, TO_HEX, TO_NUMBER, TO_TIMESTAMP, TRANSLATE, TRIM, TRUNC |
Netezza Extension Pack
SCT emulates the following database objects in the aws_netezza_ext extension pack schema.
Object type | Object name |
Function | f_int_to_string |
Python function | add_month, cast_c2bool, cast_c2c, cast_c2d, cast_c2dc, cast_c2t, cast_c2ts, cast_c2v, cast_d2c, cast_d2cf, cast_d2nf, cast_d2ni, cast_d2ts, cast_d2v, cast_d2vf, cast_i2s, cast_n2c, cast_n2cf, cast_n2d, cast_n2dc, cast_n2f, cast_n2i, cast_n2v, cast_n2vf, cast_ts2c, cast_ts2cf, cast_ts2d, cast_ts2dz, cast_ts2v, cast_ts2vf, current_date, current_time, current_timestamp, extract, instr, last_day, lpad, next_day, nvl, nvl2, overlaps, rpad, timestamp, to_char, to_date, to_number, translate, unicodes |
Applying the Extension Pack
Consider a typical migration of the data warehouse and its services to the Amazon Redshift. Some of the source functions may be converted directly, and for others SCT uses extension pack or Python libraries. You need to upload these Python libraries to S3 Bucket an intermediate step. Also, you may use the S3 Bucket to upload data, extracted by the data extractors. SCT helps to manage all of these processes, and you can use DB Best Database Compare Suite to validate migration.
So, let’s discover the typical workflow on applying the extension pack under Amazon Redshift. Before you start, you should perform the following steps:
- create S3 Bucket in the AWS console,
- drop the previous version of the extension pack if it exists and
- create the AWS profile in the Global Settings or install AWS CLI.
So, now you can start applying the extension pack. To do so, right-click in the target metadata tree and select the ‘Apply Extension Pack’ option from the pop-up menu.
First, you have to specify the AWS Services settings in the extension pack wizard. Then you have to provide the existing S3 Bucket file path. SCT will upload the Python library here. Finally, press the ‘Create Extension Pack’ button in the final tab of the wizard.
Observations
We used the Amazon Redshift extension pack in a number of migration projects. So, below we will describe the key findings on the extension pack that we discovered.
- While applying the extension pack to Amazon Redshift, you should delete the corresponding schema from the target data warehouse if it exists. If you don’t drop it, SCT will not overwrite it when you try to apply the extension pack. Thus you will lose the updated information.
- Creating a Python UDF library you should consider that the maximum size of the user-installed libraries cannot exceed 100 MB. Thus, you have to control the size of your archives and remove the libraries that you don’t use.
- If you have several AWS Profiles saved in the SCT, you can select the right one to use inside the current project in the Project Settings. But if you don’t have any AWS profiles set in the application, you have to go to the global application settings.
Please check out our new AWS Schema Conversion Tool Jumpstart offer to get you up and running fast for your migration to Amazon Redshift.
Other resources
Be sure to check out the Aginity Workbench for Amazon Redshift. We found this tool bets using SQL Workbench/J hands down for performing everyday tasks with Amazon Redshift. You can download and use Aginity Workbench for free at http://www.aginity.com/redshift/.
Related AWS SCT Posts
- Migrating an Oracle to MySQL Database using the AWS Schema Conversion Tool
- AWS Schema Conversion Tool: Offline mode
- AWS Schema Conversion Tool: Connecting to Oracle Source Database
- Selecting target database type for migration using Assessment Report in AWS Schema Conversion Tool
- Troubleshooting database migration issues using AWS Schema Conversion Tool Assessment Report
- Converting Objects during Database Migration using AWS Schema Conversion Tool
- Applying Converted Code to Target Database
- General Approach to Migration of Data Warehouses to Amazon Redshift
- AWS Schema Conversion Tool: Specific Features of Migrating Data Warehouses to Redshift
- First look at AWS Schema Conversion Tool with tighter DMS integration
- Using AWS Schema Conversion Tool for Application Conversion
- Optimizing your Redshift Storage with AWS Schema Conversion Tool
- Extracting Data from Warehouses with Data Migration Agents in AWS Schema Conversion Tool
- Using AWS Schema Conversion Tool Extension Pack for OLTP databases