Monitoring Cassandra cluster with Prometheus Operator
Our current project has several microservices. All of them are dockerized and deployed to a Kubernetes ecosystem. There is a Cassandra cluster running outside the Kubernetes cluster and it is being used by most of our services.
On the monitoring front, we have Prometheus Operator, beautiful Grafana dashboards and Loki to aggregate the logs generated by the services.
All of these tools, namely Prometheus, Grafana, and Loki are also running within the Kuberenetes cluster. So it was pretty straightforward to set up and configure these for our services.
Then came the task of monitoring the Cassandra cluster. Since it is running outside of Kubernetes cluster, we have to do the following things.
- Export the metrics from the Cassandra cluster
- Scrape the exported metrics from Cassandra nodes
- Create a dashboard on Grafana with these metrics
Step 1: Export the metrics from the Cassandra cluster
We found this tool cassandra_exporter which is a fork of JMX exporter and is fairly easy to install and configure. Please go through the README of the project. Configure the metrics you want to export and the port on which you have to export those on all the nodes in the cluster.
At the end of a successful configuration, you can get the metrics from your Cassandra node on localhost:listenPort/
or localhost:listenPort/metrics
Step 2: Scrape the exported metrics from Cassandra nodes
Implementing the monitoring for the Cassandra cluster became a little tricky since the Prometheus Operator runs within the Kubernetes cluster and the Kubernetes cluster doesn’t know anything about the Cassandra cluster.
Let me circle back to the above statement after diving into the Prometheus Operator.
Prometheus Operator
Operators were introduced by CoreOS as a class of software that operates other software, putting operational knowledge collected by humans into software. Read more in the original blog post, Introducing Operators. The Prometheus Operator serves to make running Prometheus on top of Kubernetes as easy as possible while preserving Kubernetes-native configuration options.
To simplify the process of monitoring services in a Kubernetes cluster, the Prometheus Operator introduces additional resources:
- Prometheus
- ServiceMonitor
- AlertManager
As evident from the architecture diagram, ServiceMonitor helps you to define how to group services for monitoring. Based on the ServiceMonitor definition Prometheus Operator creates the scrape configuration. This makes our job easier.
Problem
This is excellent if the service is running inside the Kubernetes cluster, but the Cassandra cluster is running outside of it. Hence, there are no services or service definitions for the same and we cannot easily monitor them.
Workaround
Let’s first implement a workaround for one node in the Cassandra cluster,
Create a Kubernetes endpoint to the Cassandra node
apiVersion: v1
kind: Endpoints
metadata:
name: cassandra-metrics
labels:
release: prometheus-operator
namespace: monitoring
subsets:
- addresses:
- ip: <ip address of the cassandra node>
ports:
- name: metrics
port: 8080
protocol: TCP
Create a service that listens to this endpoint
apiVersion: v1
kind: Service
metadata:
name: cassandra-metrics
namespace: monitoring
labels:
release: prometheus-operator
k8s-app: cassandra-metrics
spec:
type: ExternalName
externalName: <ip address of the cassandra node>
ports:
- name: metrics
port: 8080
protocol: TCP
targetPort: 8080
Create a ServiceMonitor that will monitor the above service
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: cassandra-metrics-sm
labels:
release: prometheus-operator
prometheus: kube-prometheus
namespace: monitoring
spec:
selector:
matchLabels:
release: prometheus-operator
k8s-app: cassandra-metrics
namespaceSelector:
matchNames:
- monitoring
endpoints:
- port: metrics
interval: 10s
honorLabels: true
path: /metrics
At this point, you will be able to see the service monitor cassandra-metrics-sm under service discovery and targets in your Prometheus UI. Something like in the image below.
Now let’s add more nodes from the Cassandra cluster to the ServiceMonitor. If you remember the architecture diagram above, this is a no brainer. We just have to create endpoints for remaining instances and create services to map to those endpoints. We don’t have to create another ServiceMonitor since we need all the services to grouped under the same.
If the configurations are complete you can see an increase in the number of active targets.
Step 3: Create the Grafana Dashboard
There are lots of Grafana dashboards available online you can use any one of them including the one in cassandra_exporter documentation or you can create one of your own.
Conclusion
We discussed how we can use the Prometheus Operator running inside the Kubernetes cluster to monitor an externally running Cassandra cluster. We can use the same approach to monitor other services and instances as well. But remember to use a valid metrics exporter.