New Flight Oncall Arrangement
Hi folks,
As some of you may have known, we will have a new oncall arrangement for flight starting Q2-2020. I would like to inform you that there are new major adjustments for this arrangement compared to previous one.
- We will not allocate additional manpower for flight oncall rotation (layer 1 and 2). We will fully utilize flight subdomain oncall instead.
- While we are preparing a proper escalation flow for each subdomain, we still use flight oncall escalation flow that has been configured previously in our pagerduty.
- We will pick one from subdomain oncall (excluding mainflow) to be placed in flight oncall rotation (layer 1 and 2). S/he will be the first layer escalation to ackowledge the incident when it comes from pagerduty. However, the main PIC to resolve the incident is the subdomain oncall, so flight oncall is expected to forward the alert to the relevant subdomain oncall. The same rule also applicable for all issues reported by product team or other team.
- Each subdomain is expected to have their 2nd layer subdomain oncall whenever possible and whenever needed. By default the leader will be the 2nd layer of escalation.
Also, the next 3 weeks from now will be the transition period, which means:
- We will setup a proper alert to each respective subdomain without deleting the old setup. Thus a single incident is expected to page both flight general oncall and sub-domain oncall. For example, when high exception count occurred in fprint service, if in the previous arrangement it will only page @flight-oncall, we will send the alert to @flight-oncall and @flight-supply-oncall in this new arrangement.
- After 3 weeks (or all monitoring have been properly addressed to the respective subdomain oncall), we will remove the alert to @flight-oncall. All incident is expected to page only it's respective sub-domain. In this period, even if you are assigned as @flight-oncall, you will be expected not to get alert if the service is not belongs to your sub-domain. When you got alerted but it is not the service for your sub-domain, that means the old setup has not been deleted. Please let me know when you find this condition that time.
This announcement is marked as official announcement as well as oncall transition period kickoff. For more details about the arrangement, please refer to the document: https://docs.google.com/document/d/1Ih8rBo-sacn5yCXEUfldn0bQRif2paJVxMmGKghnLRs/edit?usp=sharing