Hi all, Ingestion team is planning to apply retention policy to nrtprod
dataset in tvlk-realtime
project on GCP.
We have been keeping the data since 2012 in the BQ and most of us do not need them. Aligning with the cost optimization mission for this quarter, it will be more cost-efficient if we archive the data older than 2 years to cheaper storage, in our case is GCS storage. Referring to last month bill, the BQ storage cost was around $10k. And using a naive calculation, we could potentially save 30-40% or $3-4K a month.
Data older than 2 years old will be archived to GCS storage and removed from BQ. Data is archived to gs://tvlk-data-datalake-prod/traveloka/data/v1/tracking/parquet/<table_name>
.
If no further concerns, we will apply this on 24th Wednesday 2020.
Please join this channel #data-decommissions for support or you can contact any of ingestion team member directly.
What should we do if we have a lot of tables?
If you could help us with the list, we will help you automate the process instead of clicking the UI one-by-one.
What if i still need data more than 2 years occasionally?
Data older than set retention policy is still in GCS. The ETL can use GCS as the data source or if really needed, we can recover this data temporarily to BQ.
Thank you for your attention!
Best Regards,
Ingestion Team