Amazon Redshift & Transformations
Challenge: Extract, Load, and Transform Data -- Amazon Redshift
Organizations looking to use Amazon Redshift face the common challenge of looking to a variety of sources, then extracting, loading, and transforming data for analytic consumption.
Challenges compound when joined with the need for near real-time data availability, narrow windows for data acquisition and requirements for multi-dimensional analytic views.
Question of Design: Performance & Resources
Legacy ETL design has long included the combination of:
- significant development resources
- intermediary staging server to support transformation processes
Given this design, to successfully transform data, organizations must employ a heavy series of processes:
- micro-batch with the control file that sets the batch frequency
- data is extracted & staged
- data travels through multiple transformation cycles
- transformed data is loaded into target data warehouse
This approach is often described as resource intensive in the way of costly development efforts, on-going maintenance, and computing hardware. Further, the cycles of transforming staged data can delay otherwise real-time data loading and limit Information Availability to business users.
Solution: Extract, Load, then Transform (ELT)
Feed with Speed. Transform with the Computing Power of Amazon Redshift.
For organizations which require to load and transform data for ingest to Amazon Redshift, one may consider this popular design:
- Attunity CloudBeam's Replicate-for-Redshift efficiently extracts data (full load and/or incremental changes) from a variety of heterogeneous sources, performs light transformations (concatenations, calculations, etc.)
- Process extends whereby Attunity CloudBeam continuously loads data to Staging area (Schema or Database) within Amazon Redshift
- Once data resides in Amazon Redshift Staging area, to perform further Transformations, one may execute Home Grown scripts or a Third Party solution
- Following transformation, execute basic script to write data to Production Database within Amazon Redshift
Benefits of Design:
- Accelerate Information Availability
- Minimize use of development and computing resources
- Transform only relevant sets of data (not entire set)