Hi @here,
Per this announcement (https://tvlk.slack.com/files/U02UVN4NJ/FBR9B1BRC/gcp_dataflow_announcement.sh) made by GCP, the Dataflow Job that your team uses (repo link here), for storing/summarized Tracking Data to your own DynamoDB, are running with Dataflow SDK 1.9.0 version, which won't be supported anymore on 15 August 2018. The long term support will be the one using the latest version: Apache Beam SDK.
Since the deadline is due in 1 month, I took a liberty to upgrade this monolithic codebase. Specifically, I've made some changes:
cff97f1c911be99d9ee2f26c46454c274f5e57f8
feature/upgrade-to-beam
, I've fixed all regression/problem occured when comparing the output of latest pipeline test output vs the previous pipeline test output (the golden files).
traveloka-product
to implement equals
and hashCode
method. This is to remove warning message from Apache Beam, due to unstable ser-de on that platform. However, for marketing
module, please do it by your self (since the POJO (data model) number is quite a lot).
~/tvlkrtpipe/bin/payment/
for payment team's) to be compatible with Apache Beam one and try to run it in staging until it works.
feature/upgrade-to-beam
, then please change these:
~/tvlkrtpipe/bin/<your-team>/
folder.
For example, you can take a look at these:
~/tvlkrtpipe/bin/payment/run-usercontextsummary-email-stg.sh
~/tvlkrtpipe/bin/payment/run-usercontextsummary-email-prod.sh
Some important flags are:
--jobName
value different than previous one, since you want to spawn a different job with different name.
--stagingLocation
and --tempLocation
feature/upgrade-to-beam
and your own branch. Fill me ( agp12 ) or zaka (zakazai) as phabricator reviewer.
equals
and hashCode
method on your POJO (data model, usually in your data
package). Hint: use Intellij IDE to generate such thing, by alt + insert
-> Generate equals and hashCode.
data-<team>-dataflowdriver-01.dev.data.tvlk.cloud
instance
tvlkrtipipe
folder, checkout to your own branch, then execute the staging runner script you've chenged and follow the link to check the dataflow job running (if it has any error)
data-<team>-dataflowdriver-01.prod.data.tvlk.cloud
instance
tvlkrtipipe
folder, checkout to your own branch, then execute the production runner script you've chenged and follow the link to check the dataflow job running (if it has any error)