Infrastructure Request and Incident Handling Workflow
Hello Product Team,
Following up Product Infra Reps (Delegates) Program, we are going to change infrastructure request creation and incident handling process.
Infrastructure Request
Currently all infrastructure request (new infra or change infra) in the shared tvlk-prod
account using TechOps Service Desk. Since we have delegates and delegates become bigger, they need access to Tech-Ops Service Desk with these ability:
- Do ticket estimation.
- Update labels in the ticket.
- Move to backlog.
- Update ticket status (resolve, customer error, etc).
- Comment the ticket.
To have these ability, they need JIRA service desk license. Since there is limitation on license and it can took longer, we take github as replacement.
What will change in Infrastructure Request workflow?
- Infrastructure request will be going through github issue (repository: https://github.com/traveloka/infra-production-playbooks).
- Infrastructure request ticket template on github issue is remain the same with Site-Infra template (Follow Infra Request Guide).
- Engineer can create PR along with the infrastructure request.
- Review and execution is done by the delegates.
- The delegates can escalate to Site-infra if necessary (e.g. : out of delegate scope, delegates do not have the policy to execute it, etc)
What happen to existing Infrastructure Request ticket on JIRA?
The delegates will create github issue and add the JIRA ticket link into it when they are executing the ticket.
Incident Handling
Currently incident that cannot handled by PDA will be escalate to Site-Infra. Since we have delegate we change our Incident Handling workflow into PDA -> Delegates -> Site-Infra.
What will change in Incident Handling workflow?
- Incident report will be going through github issue (repository: https://github.com/traveloka/infra-production-playbooks).
- Incident ticket template on github issue is remain the same (follow how to report an Incident).
- Engineer can create PR along with the incident report.
- Review and execution is done by the delegates.
- The delegates can escalate to Site-infra if necessary (e.g. : out of delegate scope, delegates do not have the policy to execute it, etc).
What should you do?
- Make sure the incident cannot be solved by yourself or PDA.
- Create incident ticket and notify the delegates in your department.
- Trigger pager duty directly to delegate if it is urgent.
When is this worklow implement effectively?
This workflow will be implement effectively by 19 June 2019.