Hi @here,

Per this announcement (https://tvlk.slack.com/files/U02UVN4NJ/FBR9B1BRC/gcp_dataflow_announcement.sh) made by GCP, the Dataflow Job that your team uses (repo link here), for storing/summarized Tracking Data to your own DynamoDB, are running with Dataflow SDK 1.9.0 version, which won't be supported anymore on 15 August 2018. The long term support will be the one using the latest version: Apache Beam SDK.

Since the deadline is due in 1 month, I took a liberty to upgrade this monolithic codebase. Specifically, I've made some changes:

Make all your codebase compatible with apache beam SDK, this includes, but not limited to):

some package renaming
CombineFn logic changes
see more detail here in docs

Improve your codebase's test to capture all regression that might happen.

I generated the so called 'Golden Files', that describes the output of your stable pipeline test (generated from previous SDK, the stable and correct one). See the commit hash here: cff97f1c911be99d9ee2f26c46454c274f5e57f8
In the branch feature/upgrade-to-beam, I've fixed all regression/problem occured when comparing the output of latest pipeline test output vs the previous pipeline test output (the golden files).

I've made all POJO (data model) on all traveloka-product to implement equals and hashCode method. This is to remove warning message from Apache Beam, due to unstable ser-de on that platform. However, for marketing module, please do it by your self (since the POJO (data model) number is quite a lot).
I've made some changes on the runner script (located at ~/tvlkrtpipe/bin/payment/ for payment team's) to be compatible with Apache Beam one and try to run it in staging until it works.

What you need to do:

please give the --jobName value different than previous one, since you want to spawn a different job with different name.
please provide both --stagingLocation and --tempLocation
You can ask for review for diff between feature/upgrade-to-beam and your own branch. Fill me ( agp12 ) or zaka (zakazai) as phabricator reviewer.
For marketing team, please implement equals and hashCode method on your POJO (data model, usually in your data package). Hint: use Intellij IDE to generate such thing, by alt + insert -> Generate equals and hashCode.
Run the test to see any regression on your changes
Test it on staging

ssh to data-<team>-dataflowdriver-01.dev.data.tvlk.cloud instance
in tvlkrtipipe folder, checkout to your own branch, then execute the staging runner script you've chenged and follow the link to check the dataflow job running (if it has any error)

Run it on production

ssh to data-<team>-dataflowdriver-01.prod.data.tvlk.cloud instance
in tvlkrtipipe folder, checkout to your own branch, then execute the production runner script you've chenged and follow the link to check the dataflow job running (if it has any error)
if it works properly, please drain your previous dataflow job.

Dataflow Job SDK 1.9.x Upgrade to Apache Beam SDK

What you need to do:

Important Notes: