v5

Monitoring

Monitoring and observability

With moving cosmos to container based service, we needed a better way to monitor the internals of cosmos. So here is some information on external services that you can use to monitor cosmos. If you want to read more about Monitoring Distributed Systems

Fluent/Fluentd

Fluentd is an open source data collector, which lets you unify the data collection and consumption for a better use and understanding of data.

Notes

in_docker.conf

<source>
  @type forward
  port 24224
  bind 0.0.0.0
</source>
<match *.metric>
  @type copy
  <store>
    @type elasticsearch
    host cosmos-elasticsearch
    port 9200
    logstash_format true
    logstash_prefix metric
    logstash_dateformat %Y%m%d
    include_tag_key true
    type_name access_log
    tag_key @log_name
    flush_interval 1s
  </store>
  <store>
    @type stdout
  </store>
</match>
<match *__cosmos.log>
  @type copy
  <store>
    @type elasticsearch
    host cosmos-elasticsearch
    port 9200
    logstash_format true
    logstash_prefix cosmos
    logstash_dateformat %Y%m%d
    include_tag_key true
    type_name access_log
    tag_key @log_name
    flush_interval 1s
  </store>
  <store>
    @type stdout
  </store>
</match>
<match *.**>
  @type copy
  <store>
    @type elasticsearch
    host cosmos-elasticsearch
    port 9200
    logstash_format true
    logstash_prefix fluentd
    logstash_dateformat %Y%m%d
    include_tag_key true
    type_name access_log
    tag_key @log_name
    flush_interval 1s
  </store>
  <store>
    @type stdout
  </store>
</match>

Dockerfile

FROM fluent/fluentd:v1.10.3-1.0

COPY ./in_docker.conf /fluentd/etc/fluent.conf
USER root
RUN gem install fluent-plugin-elasticsearch --no-document --version 4.0.7 \
  && gem install fluent-plugin-prometheus --no-document --version 1.8.5
USER fluent

OpenDistro

Open Distro for Elasticsearch provides a powerful, easy-to-use event monitoring and alerting system, enabling you to monitor your data and send notifications automatically to your stakeholders. With an intuitive Kibana interface and powerful API, it is easy to set up and manage alerts.

Notes

When testing this I found that depending on how you ingest your logs into the opendistro I found I had to disable security. Here is an example of the docker file.

Dockerfile

FROM amazon/opendistro-for-elasticsearch:1.12.0

RUN /usr/share/elasticsearch/bin/elasticsearch-plugin remove opendistro_security

Prometheus

Prometheus scrapes metrics from instrumented jobs, either directly or via an intermediary push gateway for short-lived jobs. It stores all scraped samples locally and runs rules over this data to either aggregate and record new time series from existing data or generate alerts. Grafana or other API consumers can be used to visualize the collected data.

Notes

prometheus.yaml

global:
  scrape_interval: 15s
  evaluation_interval: 15s

rule_files:
  # - "first.rules"
  # - "second.rules"

scrape_configs:
  - job_name: prometheus
    static_configs:
      - targets: ["localhost:9090"]

  - job_name: cosmos-internal-metrics
    metrics_path: "/cosmos-api/internal/metrics"
    static_configs:
      - targets: ["cosmos-cmd-tlm-api:2901"]

  - job_name: cosmos-cmd-tlm-api
    metrics_path: "/cosmos-api/metrics"
    static_configs:
      - targets: ["cosmos-cmd-tlm-api:2901"]

  - job_name: cosmos-script-runner-api
    metrics_path: "/script-api/metrics"
    static_configs:
      - targets: ["cosmos-script-runner-api:2902"]

  - job_name: minio-job
    metrics_path: /minio/v2/metrics/cluster
    scheme: http
    static_configs:
    - targets: ['cosmos-minio:9000']

Dockerfile

FROM prom/prometheus:v2.24.1
ADD prometheus.yaml /etc/prometheus/

Grafana

Grafana is a multi-platform open source analytics and interactive visualization web application. It provides charts, graphs, and alerts for the web when connected to supported data sources.

Notes

datasource.yaml

apiVersion: 1

datasources:
  - name: Prometheus
    type: prometheus
    # Access mode - proxy (server in the UI) or direct (browser in the UI).
    access: proxy
    url: http://cosmos-prometheus:9090

Dockerfile

FROM grafana/grafana

COPY datasource.yaml /etc/grafana/provisioning/datasources/