Monitoring
Monitoring and observability
With moving cosmos to container based service, we needed a better way to monitor the internals of cosmos. So here is some information on external services that you can use to monitor cosmos. If you want to read more about Monitoring Distributed Systems
Fluent/Fluentd
Fluentd is an open source data collector, which lets you unify the data collection and consumption for a better use and understanding of data.
Notes
in_docker.conf
<source>
@type forward
port 24224
bind 0.0.0.0
</source>
<match *.metric>
@type copy
<store>
@type elasticsearch
host cosmos-elasticsearch
port 9200
logstash_format true
logstash_prefix metric
logstash_dateformat %Y%m%d
include_tag_key true
type_name access_log
tag_key @log_name
flush_interval 1s
</store>
<store>
@type stdout
</store>
</match>
<match *__cosmos.log>
@type copy
<store>
@type elasticsearch
host cosmos-elasticsearch
port 9200
logstash_format true
logstash_prefix cosmos
logstash_dateformat %Y%m%d
include_tag_key true
type_name access_log
tag_key @log_name
flush_interval 1s
</store>
<store>
@type stdout
</store>
</match>
<match *.**>
@type copy
<store>
@type elasticsearch
host cosmos-elasticsearch
port 9200
logstash_format true
logstash_prefix fluentd
logstash_dateformat %Y%m%d
include_tag_key true
type_name access_log
tag_key @log_name
flush_interval 1s
</store>
<store>
@type stdout
</store>
</match>
Dockerfile
FROM fluent/fluentd:v1.10.3-1.0
COPY ./in_docker.conf /fluentd/etc/fluent.conf
USER root
RUN gem install fluent-plugin-elasticsearch --no-document --version 4.0.7 \
&& gem install fluent-plugin-prometheus --no-document --version 1.8.5
USER fluent
OpenDistro
Open Distro for Elasticsearch provides a powerful, easy-to-use event monitoring and alerting system, enabling you to monitor your data and send notifications automatically to your stakeholders. With an intuitive Kibana interface and powerful API, it is easy to set up and manage alerts.
Notes
When testing this I found that depending on how you ingest your logs into the opendistro I found I had to disable security. Here is an example of the docker file.
Dockerfile
FROM amazon/opendistro-for-elasticsearch:1.12.0
RUN /usr/share/elasticsearch/bin/elasticsearch-plugin remove opendistro_security
Prometheus
Prometheus scrapes metrics from instrumented jobs, either directly or via an intermediary push gateway for short-lived jobs. It stores all scraped samples locally and runs rules over this data to either aggregate and record new time series from existing data or generate alerts. Grafana or other API consumers can be used to visualize the collected data.
Notes
prometheus.yaml
global:
scrape_interval: 15s
evaluation_interval: 15s
rule_files:
# - "first.rules"
# - "second.rules"
scrape_configs:
- job_name: prometheus
static_configs:
- targets: ["localhost:9090"]
- job_name: cosmos-internal-metrics
metrics_path: "/cosmos-api/internal/metrics"
static_configs:
- targets: ["cosmos-cmd-tlm-api:2901"]
- job_name: cosmos-cmd-tlm-api
metrics_path: "/cosmos-api/metrics"
static_configs:
- targets: ["cosmos-cmd-tlm-api:2901"]
- job_name: cosmos-script-runner-api
metrics_path: "/script-api/metrics"
static_configs:
- targets: ["cosmos-script-runner-api:2902"]
- job_name: minio-job
metrics_path: /minio/v2/metrics/cluster
scheme: http
static_configs:
- targets: ['cosmos-minio:9000']
Dockerfile
FROM prom/prometheus:v2.24.1
ADD prometheus.yaml /etc/prometheus/
Grafana
Grafana is a multi-platform open source analytics and interactive visualization web application. It provides charts, graphs, and alerts for the web when connected to supported data sources.
Notes
datasource.yaml
apiVersion: 1
datasources:
- name: Prometheus
type: prometheus
# Access mode - proxy (server in the UI) or direct (browser in the UI).
access: proxy
url: http://cosmos-prometheus:9090
Dockerfile
FROM grafana/grafana
COPY datasource.yaml /etc/grafana/provisioning/datasources/