HNET-MONGOD Version Upgrade From 3.0 to 3.6

Background

As we know, there were increased number of major incidents related to hnet-mongod degradation. Based on our investigation, there is a known bug in mongo 3.x when WiredTiger cache is at 95% full capacity causing heavy performance degradation until the mongo itself got unreachable. After two months from our last incident, it recently just got reached to 93% which is quite dangerous.

Purpose

Upgrading our hnet-mongod version from 3.0 to 3.6 should eliminates this known bug for good and prevent such incident re-occurring again in the future.

Schedule

We will execute the upgrading process on 13-Sept-2019 Fri 3AM until 5AM. During the time, there will be two version upgrade phases: 3.0 to 3.2, 3.2 to 3.4, and 3.4 to 3.6. Please find the JIRA ticket and migration plan.

Risk

Based on other team experiencing this process (e.g. ASI team), there will be minor write downtime between 5 and 10 seconds for each upgrade phase. Affected processes would be most of the tera.traveloka.com related write and update operations and issuance.

PIC (Engineering)

Supply Engineers: @aris.darmawan, @micky, @ihabibi
Data Ingestion Engineer: @rezha
Backend-Infra: @Gujarat Santana, @febryantonius
Site-Infra: @bernard

Communication channel: #hnet-mongod-upgrade-comms