GKE Staging Cluster Upgrade
Hi @channel
Due to recent high/critical security patch for GKE announcement sent by GCP, @data-kube-devops-team will be scheduling GKE Staging Cluster Upgrade.
Before we're upgrading the production cluster tvlk-data-prod
, we will upgrade the staging cluster tvlk-data-dev
first and learn from this experience. The staging cluster will be upgraded at Wednesday, July 4th 2019 13.30-14.30 UTC+7.
Note: for production cluster upgrade, we plan to do that next week, but it depends on the staging cluster upgrade status.
What we (data-kube-devops-team) will do for staging cluster upgrade (we'll do the same for production):
- Upgrade GKE Master Version to the latest (1.13.7-gke.8) on July 4th
- Upgrade all node-pools in the cluster
- Announce the result
Expectation
- During upgrade: expect there is a small service degradation (e.g: effect of pod being killed and spawned) . We will try to minimize this degradation.
- After upgrade: all services should be running properly. But if you found any anomaly (e.g: your deployment not healthy, error still exists, monitoring data missing after), please report to us.
What you need to do
- We need you to be available during the upgrade time and report any error that your staging's services may faced during and after the staging upgrade for 1-4 days. It also may preferable if your service in staging can be put a small/medium load test before the staging upgrade, just to simulate the real traffic.
- Report to @data-kube-devops-team in this slack channel for any error you find.
cc:
- @cdp-team
- @data-nvs @arinto
- @postevanus (for data-idrisk services)
- @irvifa @zakazai (tracking-service, experimentation, et.al)
- @eduard.chai (data-ml-platform)