GKE Production Cluster Upgrade
Hi @channel
Due to recent high/critical security patch for GKE announcement sent by GCP, @data-kube-devops-team will be scheduling GKE Production Cluster Upgrade.
The production cluster will be upgraded at Thursday, July 11th 2019 14.00-15.00 UTC+7. We also have asked assitance from GCP Support to be standby during this time (still tentative).
What we (data-kube-devops-team) will do for production cluster upgrade (we'll do the same for production):
- Upgrade GKE Master Version to the latest version possible from existing (target version: 1.12.9-gke.7 ) on July 11th.
- Upgrade all node-pools in the cluster
- Announce the result
Expectation
- During upgrade:
- expect there is a small service degradation (e.g: effect of pod being killed and spawned) . We will try to minimize this degradation. We also noticed that there will a small downtime on prometheus-server during upgrade, causing our monitoring system inaccessible for 1-2 minutes.
- especially during GKE Master Upgrade, you can't access the master (e.g: can't
kubectl
, deploy to production, etc).
- After upgrade: all services should be running properly. But if you find any anomaly (e.g: your deployment not healthy, error still exists, monitoring data missing after), please report to us.
What you need to do
- We need you to be available during the upgrade time and report any error that your production's services may faced during and after the production upgrade.
- Report to @data-kube-devops-team in this slack channel for any error you find.
cc:
@cdp-team
@data-nvs @arinto
@postevanus (for data-idrisk services)
@irvifa @zakazai (tracking-service, experimentation, et.al)
@eduard.chai (data-ml-platform)