Datadog Agent at Production Environment
Hi all,
Site-infra team have installed Datadog agent at production machines. You should be able to see the system and custom metrics from your machines and compare them with the metrics at Grafana. But there are some things that you still need to do:
- Currently, for alert recipient, the Slack channels are incomplete and for PagerDuty, only Ashtool team have added their account. Please check whether you still need to add your team's Slack channel(s) or PagerDuty account or not, if yes, please contact us (Backend-infra team).
- Please check your Java machines within this week and in case there is any missing host or metric, you can report it to us.
In case you need new timeboard template, feel free to add it by yourself. If you don't know how to do that, you can ask us.
- Please also copy the Seyren alerts to Datadog. You can use the Datadog tools for this (https://29022131.atlassian.net/wiki/spaces/BEI/pages/198437380/Datadog+Tools). We have provided some templates for Java application in repository https://phabricator.noc.tvlk.cloud/diffusion/MNTRGTOOLS/, in these files:
- datadog-scripts/monitor-template/tvlk-prod/monitor-cpu-usage.json
- datadog-scripts/monitor-template/tvlk-prod/monitor-disk-usage.json
- datadog-scripts/monitor-template/tvlk-prod/monitor-mem-free.json
- datadog-scripts/monitor-template/tvlk-prod/monitor-network.json
- datadog-scripts/monitor-template/tvlk-prod/monitor-open-file.json
- datadog-scripts/monitor-template/tvlk-prod/monitor-rpc-p95.json
- datadog-scripts/monitor-template/tvlk-prod/monitor-system-load.json
In case you need new monitor template, feel free to add it by yourself. If you don't know how to do that, you can ask us.
Additional document (for FAQ and troubleshooting):
https://29022131.atlassian.net/wiki/spaces/BEI/pages/203745503/Datadog
Thank you!