Postgres/Mongo Glue Ingestion v2

Hi all,

As you may know, we are using AWS Glue for our database (Postgres and Mongo) ingestion in the AWS multi-account setup. And, as you also may know, this may not the best solution in term of the cost. And this is due to the pricing of Glue v1.

So, for the last 1-2 months, we have been experimenting with AWS Glue v2 which brings a lot of improvements especially in cost point-of-view (ref: https://aws.amazon.com/blogs/aws/aws-glue-version-2-0-featuring-10x-faster-job-start-times-and-1-minute-minimum-billing-duration/). Some of the newly created ingestion pipelines have been using Glue v2 module during this assessment period. We have also closely observed the performance and so far, it has been working pretty well.

Based on the result, we are ready to go to the next step and would like to recommend for all of our existing pipelines that are still using Glue v1 module to migrate to Glue v2. For most of the time, you will just need to do the following simple steps:

Following is the cost optimisation that we get by upgrading to v2:

Specification:
Worker type: Standard
Data Processing Units (DPUs): 4 DPUs per job
Cost per DPU per hour: $0.44
Ingestion scheduler: hourly

# Glue v1
Start up time: 10 mins
Processing run time: 5 minutes
Total run time: 15 mins
Total cost per job: 15/60 mins * $0.44 * 4 = $0.44 per job
Total cost per day: $0.44 * 24 = $10.56
Total cost per month: $10.56 * 30 = $316.8

# Glue v2
Start up time: 1 mins
Processing run time: 5 minutes
Total run time: 6 mins
Total cost per job: 6/60 mins * $0.44 * 4 = $0.176 per job
Total cost per day: $0.176 * 24 = $4.224
Total cost per month: $4.224 * 30 = $126.72

# Conclusions
Total savings: 316.8 - 126.72 = $190.08

For end-to-end guide on how to implement glue ingestion, you may refer to this doc: https://29022131.atlassian.net/wiki/spaces/CDE/pages/1867027834/APP-056+APP-057+Glue+Ingestion+Quickstart.

Some of the product domains that are still using v1 (not exhaustive):

If you have any question please do not hesitate to approach @data-cde in #data-ing-channel.

Best regards,
Core Data Engineering