Blog: Using AWS Schema Conversion Tool Extension Pack for Data Warehouses

When migrating to Amazon Redshift, you need a way to convert functions and other capabilities that Oracle, Teradata, Greenplum, Netezza support that Redshift does not natively support. The AWS Schema Conversion Tool Extension Pack for Data Warehouses provides Python based libraries to provide the same functionally with Redshift. In this post, we will show how the AWS Schema Conversion Tool (SCT) generates the extension packs to help in your migration effort.

You will have to upload the extension pack for your source database which includes the functions from the user-defined Python libraries to S3 storage. In the following video, we demonstrate the typical workflow of applying the extension pack to the Amazon Redshift data warehouse.

Make sure to check out our new AWS Schema Conversion Tool Jumpstart offer to get you up and running fast for your migration to Amazon Redshift.

Continue reading to learn more about the possible issues.

Background on the Extension Pack for DW

As you may already know, SCT generates the extension pack to emulate the behavior of some source database functions in the target DB instance. The key difference of the extension pack for data warehouses lies in the additional Python functions that you may use in the converted code. You can upload this Python library to the target data warehouse alongside with the extension pack schema and use the external Python functions when needed.

But wait, you may remember that you can upload data files to Amazon Redshift only from S3 Bucket. Thus, you have to upload the Python library to the S3 Bucket first.

You can create custom user-defined functions based on the Python programming language. Amazon Redshift supports Python starting from version 2.7.

Use the following script to create a Python library:

1
2
3
4
5
6
7
CREATE LIBRARY library_name LANGUAGE plpythonu
FROM
{ 'https://file_url'
| 's3://bucketname/file_name'
[ WITH ] CREDENTIALS [AS] 'aws-auth-args'
[ REGION [AS] 'aws_region']
}

After creating the Python library, you have to add it to the archive. Then you have to upload this archive to Amazon Redshift via S3 Bucket. So, now you can get access to the function using the following command: from <name of Python file inside the zip-archive> import <function_name>.

Discovering the Extension Pack

For all types of data warehouses SCT emulates the versions table that doesn’t have a direct analog in Amazon Redshift. Also, SCT emulates some key functions, specific for various source instances. Let’s discover what’s inside in these extension packs.

Oracle Extension Pack

SCT emulates the following database objects in the aws_oracle_ext extension pack schema.

Object typeObject name
FunctionBIN_TO_NUM, HEXTORAW, INSTR, RAWTOHEX, RAWTONHEX, REGEXP_COUNT, REGEXP_INSTR, REGEXP_REPLACE, REGEXP_SUBSTR, ROUND, SOUNDEX, TRUNC
Python
function
ADD_MONTHS, BIN_TO_NUM, CAST_2C, CAST_2DC, CAST_2F, CAST_2I, CAST_2TS, CAST_2V, CURRENT_DATE, CURRENT_TIMESTAMP, EXTRACT, HEXTORAW, INSTR, LAST_DAY, LPAD, MONTHS_BETWEEN, NANVL, NEXT_DAY, NULLIF, NVL, NVL2, RAWTOHEX, RAWTONHEX, REGEXP_COUNT, REGEXP_INSTR, REGEXP_REPLACE, REGEXP_SUBSTR, REMAINDER, ROUND, RPAD, SIGN, SOUNDEX, SUBSTR, TO_CHAR, TO_DATE, TO_NUMBER, TRANSLATE, TRIM, TRUNC

Teradata Extension Pack

SCT emulates the following database objects in the aws_teradata_ext extension pack schema.

Object typeObject name
Functionbcl_monthnumber_of_calendar, bcl_monthnumber_of_quarter, bcl_monthnumber_of_year, bcl_quarternumber_of_calendar, bcl_quarternumber_of_year, bcl_weeknumber_of_calendar, bcl_weeknumber_of_year, bcl_yearnumber_of_calendar, cast_c2c, cast_c2f, cast_c2i, cast_c2v, day_of_month, day_of_week, day_of_year, daynumber_of_calendar, daynumber_of_month, daynumber_of_week, daynumber_of_year, dayoccurrence_of_month, instr, month_of_calendar, month_of_quarter, month_of_year, monthnumber_of_calendar, monthnumber_of_quarter, monthnumber_of_year, next, ngram, nvp, overlaps, prior, quarter_of_calendar, quarter_of_year, quarternumber_of_calendar, quarternumber_of_year, round, soundex, td_friday, td_monday, td_month_begin, td_month_end, td_quarter_begin, td_quarter_end, td_saturday, td_sunday, td_thursday, td_tuesday, td_wednesday, td_week_begin, td_week_end, td_year_begin, td_year_end, trunc, week_of_calendar, week_of_month, week_of_year, weekday_of_month, weeknumber_of_calendar, weeknumber_of_month, weeknumber_of_quarter, weeknumber_of_year, year_of_calendar, yearnumber_of_calendar
Python
function
add_month, cast_c2c, cast_c2d, cast_c2dc, cast_c2ts, cast_c2v, cast_d2c, cast_d2cf, cast_d2nf, cast_d2ni, cast_d2ts, cast_d2v, cast_d2vf, cast_n2c, cast_n2cf, cast_n2d, cast_n2dc, cast_n2f, cast_n2i, cast_n2v, cast_n2vf, cast_ts2c, cast_ts2cf, cast_ts2d, cast_ts2dz, cast_ts2v, cast_ts2vf, current_date, current_timestamp, day_of_month, day_of_week, day_of_year, DayNumber_Of_Calendar, DayNumber_Of_Month, DayNumber_Of_Week, DayNumber_Of_Year, DayOccurrence_Of_Month, extract, instr, last_day, lpad, month_of_calendar, month_of_quarter, month_of_year, MonthNumber_Of_Calendar, MonthNumber_Of_Quarter, MonthNumber_Of_Year, next, next_day, ngram, nullif, nvl, nvl2, oadd_month, otranslate, overlaps, prior, quarter_of_calendar, quarter_of_year, QuarterNumber_Of_Calendar, QuarterNumber_Of_Year, regexp_instr, regexp_replace, regexp_substr, round, rpad, soundex, TD_FRIDAY, TD_MONDAY, TD_MONTH_BEGIN, TD_MONTH_END, TD_QUARTER_BEGIN, TD_QUARTER_END, TD_SATURDAY, TD_SUNDAY, TD_THURSDAY, TD_TOESDAY, TD_WEDNESDAY, TD_WEEK_BEGIN, TD_WEEK_END, TD_YEAR_BEGIN, TD_YEAR_END, to_char, to_date, to_number, trunc, week_of_calendar, week_of_month, week_of_year, weekday_of_month, WeekNumber_Of_Calendar, WeekNumber_Of_Month, WeekNumber_Of_Quarter, WeekNumber_Of_Year, year_of_calendar, YearNumber_Of_Calendar

Greenplum Extension Pack

SCT emulates the following database objects in the aws_greenplum_ext extension pack schema.

Object typeObject name
FunctionDATE_PART, DATE_TRUNC, OVERLAY, REGEXP_REPLACE
Python
function
ASCII, BIT_LENGTH, BTRIM, CBRT, COT, CURRENT_DATE, CURRENT_TIME, CURRENT_TIMESTAMP, DATE_PART, DATE_TRUNC, GET_BYTE, LOCALTIME, LOCALTIMESTAMP, LPAD, LTRIM, MD5, NOW, nullif, OCTET_LENGTH, OVERLAY, POSITION, REGEXP_MATCHES, REGEXP_REPLACE, REGEXP_SPLIT_TO_ARRAY, REGEXP_SPLIT_TO_TABLE, RPAD, RTRIM, SET_BYTE, SET_BYTE_STRING, SPLIT_PART, STATEMENT_TIMESTAMP, STRPOS, SUBSTR, SUBSTRING, TIMEOFDAY, TO_CHAR, TO_DATE, TO_HEX, TO_NUMBER, TO_TIMESTAMP, TRANSLATE, TRIM, TRUNC

Netezza Extension Pack

SCT emulates the following database objects in the aws_netezza_ext extension pack schema.

Object typeObject name
Functionf_int_to_string
Python
function
add_month, cast_c2bool, cast_c2c, cast_c2d, cast_c2dc, cast_c2t, cast_c2ts, cast_c2v, cast_d2c, cast_d2cf, cast_d2nf, cast_d2ni, cast_d2ts, cast_d2v, cast_d2vf, cast_i2s, cast_n2c, cast_n2cf, cast_n2d, cast_n2dc, cast_n2f, cast_n2i, cast_n2v, cast_n2vf, cast_ts2c, cast_ts2cf, cast_ts2d, cast_ts2dz, cast_ts2v, cast_ts2vf, current_date, current_time, current_timestamp, extract, instr, last_day, lpad, next_day, nvl, nvl2, overlaps, rpad, timestamp, to_char, to_date, to_number, translate, unicodes

Applying the Extension Pack

Consider a typical migration of the data warehouse and its services to the Amazon Redshift. Some of the source functions may be converted directly, and for others SCT uses extension pack or Python libraries. You need to upload these Python libraries to S3 Bucket an intermediate step. Also, you may use the S3 Bucket to upload data, extracted by the data extractors. SCT helps to manage all of these processes, and you can use DB Best Database Compare Suite to validate migration.

Database Compare Suite

So, let’s discover the typical workflow on applying the extension pack under Amazon Redshift. Before you start, you should perform the following steps:

  • create S3 Bucket in the AWS console,
  • drop the previous version of the extension pack if it exists and
  • create the AWS profile in the Global Settings or install AWS CLI.

So, now you can start applying the extension pack. To do so, right-click in the target metadata tree and select the ‘Apply Extension Pack’ option from the pop-up menu.

aws sct extension pack for data warehouses

First, you have to specify the AWS Services settings in the extension pack wizard. Then you have to provide the existing S3 Bucket file path. SCT will upload the Python library here. Finally, press the ‘Create Extension Pack’ button in the final tab of the wizard.

Observations

We used the Amazon Redshift extension pack in a number of migration projects. So, below we will describe the key findings on the extension pack that we discovered.

  • While applying the extension pack to Amazon Redshift, you should delete the corresponding schema from the target data warehouse if it exists. If you don’t drop it, SCT will not overwrite it when you try to apply the extension pack. Thus you will lose the updated information.
  • Creating a Python UDF library you should consider that the maximum size of the user-installed libraries cannot exceed 100 MB. Thus, you have to control the size of your archives and remove the libraries that you don’t use.
  • If you have several AWS Profiles saved in the SCT, you can select the right one to use inside the current project in the Project Settings. But if you don’t have any AWS profiles set in the application, you have to go to the global application settings.

Please check out our new AWS Schema Conversion Tool Jumpstart offer to get you up and running fast for your migration to Amazon Redshift.

Other resources

Be sure to check out the Aginity Workbench for Amazon Redshift. We found this tool bets using SQL Workbench/J hands down for performing everyday tasks with Amazon Redshift. You can download and use Aginity Workbench for free at http://www.aginity.com/redshift/.

Related AWS SCT Posts