Creating AWS Data Pipelines with Boto3 and JSON

I have been doing a little work with AWS data pipeline recently for undertaking ETL tasks at work. AWS data pipeline handles data driven workflows called pipelines. The data pipelines take care of scheduling, data depenedencies, data sources and destinations in a nicely managed workflow. My tasks take batch datasets from SQL databases, processing and loading those datasets into S3 buckets, then import into a Redshift reporting database.

Seeing that production database structures are frequently updated, those changes need to be reflected in the reporting backend service. Now for a couple of years I have struggled on with Amazon's web based data pipeline architect to manage those changes to the pipeline. This has been an onerous task, as the architect does not really lend itself very well to the task of managing a large set of pipelines. Here begins a little tale of delving into the AWS data pipeline API to find another way.

View comments.

more ...