Monitoring Cassandra cluster with Prometheus Operator

Our current project has several microservices. All of them are dockerized and deployed to a Kubernetes ecosystem. There is a Cassandra cluster running outside the Kubernetes cluster and it is being used by most of our services.

On the monitoring front, we have Prometheus Operator, beautiful Grafana dashboards and Loki to aggregate the logs generated by the services.

All of these tools, namely Prometheus, Grafana, and Loki are also running within the Kuberenetes cluster. So it was pretty straightforward to set up and configure these for our services.

Then came the task of monitoring the Cassandra cluster. Since it is running outside of Kubernetes cluster, we have to do the following things.

  • Export the metrics from the Cassandra cluster
  • Scrape the exported metrics from Cassandra nodes
  • Create a dashboard on Grafana with these metrics

Step 1: Export the metrics from the Cassandra cluster

We found this tool cassandra_exporter which is a fork of JMX exporter and is fairly easy to install and configure. Please go through the README of the project. Configure the metrics you want to export and the port on which you have to export those on all the nodes in the cluster.

At the end of a successful configuration, you can get the metrics from your Cassandra node on localhost:listenPort/ or localhost:listenPort/metrics

Step 2: Scrape the exported metrics from Cassandra nodes

Implementing the monitoring for the Cassandra cluster became a little tricky since the Prometheus Operator runs within the Kubernetes cluster and the Kubernetes cluster doesn’t know anything about the Cassandra cluster.

Let me circle back to the above statement after diving into the Prometheus Operator.

Prometheus Operator

Operators were introduced by CoreOS as a class of software that operates other software, putting operational knowledge collected by humans into software. Read more in the original blog post, Introducing Operators. The Prometheus Operator serves to make running Prometheus on top of Kubernetes as easy as possible while preserving Kubernetes-native configuration options.

To simplify the process of monitoring services in a Kubernetes cluster, the Prometheus Operator introduces additional resources:

  • Prometheus
  • ServiceMonitor
  • AlertManager


src: https://coreos.com/operators/prometheus/docs/latest/user-guides/getting-started.html
src: https://coreos.com/operators/prometheus/docs/latest/user-guides/getting-started.html


As evident from the architecture diagram, ServiceMonitor helps you to define how to group services for monitoring. Based on the ServiceMonitor definition Prometheus Operator creates the scrape configuration. This makes our job easier.

Problem

This is excellent if the service is running inside the Kubernetes cluster, but the Cassandra cluster is running outside of it. Hence, there are no services or service definitions for the same and we cannot easily monitor them.

Workaround

Let’s first implement a workaround for one node in the Cassandra cluster,

Create a Kubernetes endpoint to the Cassandra node


apiVersion: v1
kind: Endpoints
metadata:
    name: cassandra-metrics
    labels:
        release: prometheus-operator
    namespace: monitoring
subsets:
    - addresses:
      - ip: <ip address of the cassandra node>
      ports:
      - name: metrics
        port: 8080
        protocol: TCP

Create a service that listens to this endpoint

apiVersion: v1
kind: Service
metadata:
    name: cassandra-metrics
    namespace: monitoring
    labels:
        release: prometheus-operator
        k8s-app: cassandra-metrics
spec:
    type: ExternalName
    externalName: <ip address of the cassandra node>
    ports:
    - name: metrics
      port: 8080
      protocol: TCP
      targetPort: 8080

Create a ServiceMonitor that will monitor the above service

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
    name: cassandra-metrics-sm
    labels:
        release: prometheus-operator
        prometheus: kube-prometheus
    namespace: monitoring
spec:
    selector:
        matchLabels:
            release: prometheus-operator
            k8s-app: cassandra-metrics
        namespaceSelector:
            matchNames:
            - monitoring
    endpoints:
    - port: metrics
      interval: 10s
      honorLabels: true
      path: /metrics

At this point, you will be able to see the service monitor cassandra-metrics-sm under service discovery and targets in your Prometheus UI. Something like in the image below.


screenshot


Now let’s add more nodes from the Cassandra cluster to the ServiceMonitor. If you remember the architecture diagram above, this is a no brainer. We just have to create endpoints for remaining instances and create services to map to those endpoints. We don’t have to create another ServiceMonitor since we need all the services to grouped under the same.

If the configurations are complete you can see an increase in the number of active targets.

Step 3: Create the Grafana Dashboard

There are lots of Grafana dashboards available online you can use any one of them including the one in cassandra_exporter documentation or you can create one of your own.

Conclusion

We discussed how we can use the Prometheus Operator running inside the Kubernetes cluster to monitor an externally running Cassandra cluster. We can use the same approach to monitor other services and instances as well. But remember to use a valid metrics exporter.