Scale up aprso-es

Request Details

Background

In order to face incoming next EPIC SALE, APR team decided to conduct stress tests to check the availability and realibility of our services.
After several stress tests, we notice aprso (one of apr service to serve omni data) experience high latency due cpu 100% usage at aprso-es

After checking at aprso-es' metrics, the search thread pool also increased that indicate we need more cpu core hence we need more cpu core's to add more worker
So we decided to vertical scale our es

thread Pools and Search Request Errors in Elasticsearch:
https://qbox.io/blog/thread-pools-elasticsearch-search-request-errors

aprso-es cpu usage reached 100%:
https://app.datadoghq.com/dashboard/4xm-mtr-4yv/apr-elasticsearch-summary-screenboard?from_ts=1565856000000&to_ts=1565860774440&live=false

aprso high latency and errors:
https://app.datadoghq.com/dashboard/2wz-yra-q5f/apr-aprso---service-health?from_ts=1565858110422&to_ts=1565859134086&live=false&tile_size=m

Purpose

Scale up ES instance type to c5.2xlarge.elasticsearch

Impact

Resolve aprso high latency during stress tests
Better cpu usage at aprso-es

Risk

Unexpected behaviour although the es docs said this process is safe

Resources

arn:aws:es:ap-southeast-1:715824975366:domain/aprso-es-b5069798eab19f19