EC2 Direct Downsizing

For all of you that interested in or already doing cost-saving effort with direct downsizing ec2 in production, please be aware that it might affect your service latency. In some cases, it can double its p95 latency from the previous cluster configuration.

Start by reading this documentation. If you are okay and well-aware about the risk, then the first week after the changes applied will be the critical time. Make sure you heavily monitor and try to mitigate the risks before concluding that it is a safe configuration.

If you don’t want to handle such risk and prefer to do some prevention approach, then load testing your cluster first is the best way. We provide the methodology and also offer assistance to help you to properly load tested your cluster. Join #backend-load_testing for more detail and feel free to ask any questions related to load testing and service reliability in general.

Thanks!