Backend Newsletter Q3 2019

Hello y’all, this is the second Backend Update (we will call this Newsletter from now on) which will cover Q3 2019. Based on the feedback we got from the previous update, we will try to post this quarterly. We also tried to publish this at the end of Q3, however due to the various issues, we have to push this back to the start of Q4.

Initiative Highlights

ASG Migration

PIC: Salvian

Why

Progress

Plans

Problems

EC2 Downsizing

PIC: Igit, Salvian

Why

Our default EC2 instance type was m4.large. However:

Progress

Plans

Problems

Backend Load Testing

PIC: Igit

Why

Progress

Plans

Problems

Multi-account

PIC: Fajrin, Febry Antonius, Gujarat Santana, Darwin Wirawan

Why

Goal : Enable autonomy
Impact :

Many teams have been driven by impacts above to migrate to multi-account, as multi-account is a great investment for both technology and business. If you think multi-account will help your PD, feel free to contact us!

Progress

Plans

Labs and Sharing Session :

Feel free to discuss with us!

Problems

We don’t have any. Maybe you have problems, and multi-account can solve them!

Java 8 Migration

PIC: Ronny
Number of applicable product domains: 49

Why

Java 7 had reached end-of-life on April 2015. Furthermore, sometimes we can’t use newer version of some libraries because we still use Java 7. By using Java 8, we will:

Progress

Plans

Problems

Multi-repo Migration

PIC: Christianto Handojo
Number of applicable product domains: 43

Why

Working in the old monorepo has become cumbersome due to big code size and amount of different teams that work in the repo, resulting in among other things unreliable repo (landing failure etc), uncontrolled growth of branch total, long build times for revision checking and application release. Moving monorepo hosting to Github (thanks to @echon) has somewhat alleviated the problem of unreliable repo, but other problems remain.

Progress

Plans

Problems

Backend Microservice Integration Testing Practices

PIC: Salvian

Why

Progress

Plans

Monitoring Vendor Research

PIC: Ronny

Why

Because of some issues related to cost attribution and user-access management, we decided to find an alternative for Datadog. We want to finalize this soon (Q4 2019 or Q1 2020) because the migration from Datadog is ideally finished before the next year’s Datadog contract expires.

Progress

So far we have started the trial for SignalFX with help from ipi and fpr (Thank you @putu.pradnyana, @Vincent, @janesa.tarigan). We have finished covering the current Datadog use cases, but not yet trialled APM (Application Performance Monitoring). Current docs: https://docs.google.com/document/d/1CoXr6AAHyOUuU2_QMJECMlFxdM55ATt0d9MyJ_Zr2Xg/edit# (will be moved to Confluence later).

Plans

Problems

For example: if major incidents rarely happens in Traveloka, do we want to spend $1M / year for APM?
So we will need to have more details (haven’t finalized yet) when doing incident tracking.

Log Analysis using AWS CWL Insights

PIC: Ronny

Why

We have interviewed some teams regarding observability and one of the common feedback is usually it’s hard to query CWL, especially for querying MongoDB logs (to find slow queries) or querying multiple application log groups (to trace a certain request / bookingId / invoiceId). Fortunately, AWS Insights already has a new feature (published on 26 July 2019) to query multiple log groups.

Progress

Backend-infra team has done preliminary research on AWS CWL Insights. Current docs:

We also got back to a few teams that we have interviewed before, but the problem is their log formats may not be the same between services, so it’s still hard to have 1 single query that can parse them all correctly (example: in 1 service they log bookingId as a plain number, in another service they log it as “booking ID <number>”, in another service they log it as “bookingId: <number>”). But at least we can use this for MongoDB logs for now.

Plans

Problems

Need to make sure this will bring benefits to teams, but still a bit hard to do this in most services’ logs, because of non-uniform log message format.

Other Information

Datadog Update

Container

We postponed the research on backend containerization. One of its obvious benefits, cost saving, is also already achievable using EC2 with burstable / spot instances, with no additional research and lower implementation effort. However, we might consider container for other initiatives, e.g. local testing using mock service. Don’t hesitate to contact us if you have any concerns.

The Effective Engineer

We recently had a book discussion on The Effective Engineer internally. We found that the book shares some very important mindsets which can help us to become more effective as engineers. We strongly suggest everyone to read the book, no matter what level you are currently at. If you are too lazy (:p) to read the book (it is only ~200 pages long), you can read the summary that we have made here.

Backend Newsletter as Confluence Blogs

All Backend Newsletter will be published as Confluence blogs starting from the previous one (Q1-Q2 2019). This backend newsletter had been published here.

Thank you

We would like to thank everyone who has contributed on all of the initiatives. Adieu!