[On-call Procedure] TBIWP Hanging Task checking
Background, why we need this?
Every single Booking ID will impacted on finance process (payment to supplier, collection, deposit reconcilliation, reporting, etc) including blocking finance monthly closing. So we need to ensure completeness and correctness of transaction data on each product, especially for product that using TBI/BP.
Objective
- Faster missing booking detection & recovery time.
Operating Procedure, what we need to do?
If you are L1 On-call on 9AM and 4.30PM, you are responsible to do this on your schedule:
- See #tbiwp-prod-booking to get TBIWP hanging booking data every 9AM and 4.30PM everyday. example data: here
- Create new google sheet file, put name on it "TBIWP Hanging Tasks <DATE DD-MM-YYYY> <TIME hh24>". example: here
- Share the sheet in enterprise-ibfp channel, with mentioning related PICs based on the trip types listed so related BA / L2 Engineers can do follow up. (please all PM's ensure all your team members invited to this channel). PICs:
- Package + Cross-sell: @agusputri @stefenr @hartanto.hartanto
- Experience: @adhika, @kennywinata, @faisal.rz
- Axes: @adhika, @kennywinata, @faisal.rz
- Connectivity: @kennywinata, @faisal.rz
- Ebill: @kennywinata, @faisal.rz
- Cinema: @kennywinata, @faisal.rz
- Bus: @mamid, @christian.prayudi, @lydia.natalia
- Vehicle Rental: @kris.parlindungan, @kennywinata, @faisal.rz
- Airport Transport: @Andriyanto, @christian.prayudi, @lydia.natalia
- Train: @mamid, @christian.prayudi, @lydia.natalia
- Hotel: @hadi_japarto, @merlin.prayogi
- Insurance: @Andry Syafurudin, @Iman (Ahmad Nooriman)
- Flight: @Nurlaili, @Daniel Suryawijaya
- Consult first to this document whether this incident is a repeat incident of a previous one: here. If the bug and solution is already listed there, L1 on-call may follow the instruction there while consulting with L2 on-call as necessary. DO NOTE THAT THE SYMPTOM & ROOT CAUSE HAS TO BE THE EXACT SAME OR LISTED, BEFORE FOLLOWING THE SOLUTION. Otherwise, go to step 5.
- Proposed SLA:
- Ideally all of hanging tasks need to be solved at the same day.
- for this product, ensure the recovery still on the same month with the transaction:
- Airport Transport
- E-Bill
- Cinema
- Bus
- Car Rental
- For every transaction that need to be re-started, please ensure you put also "need to be suspended" in the spreadsheet after the process already success.
- L2 on-call need to put the bug and solution in the Bug List Google Sheet: here. If the solution is not very clear yet or no repeat incident expected (
or just too lazy to write it down), just put the solution as “Consult with L2”. Examples are provided in the sheet. Note that the default action for L1 on-calls are to consult with L2 when it’s not listed, so negligence will mean disruptions in the future.
- How to do suspend? //TODO
Thanks.
Corptech NFP - Engineering Leads
Notes:
- Steps 1-3 should be able to be automated via Ansible Tower playbook script. You are free to work on it during your on-call period.