Despite the fact thatApache Cassandra provides a large number of metrics through the popular Metrics library, it does not, however, provide any out-of-the-package solutions to monitor these metrics. The command-line based nodetool utility can be used to analyze some of the Cassandra internal metrics, but it is by the nature not designed for monitoring purpose. For many users, DataStax OpsCenter becomes the only viable and ready-to-use monitoring solution for them to monitor their Cassandra clusters. The bad news, though, is that starting from OpsCenter v6.0, OpsCenter will be only available to DataStax Enterprise Edition (DSE) users. Open source Cassandra users cannot use this tool any more.
In this post, I will explore an open source Cassandra monitoring solution based on Cassandra pluggable metrics reporting with Graphite and Grafana. The post will start with the high level architecture of this solution, followed by the step-by-step instructions of setting this solution up on a Ubuntu 14.0.4 VM based host. The Cassandra version used in this setup is 2.1.14.
2. Architecture OverviewThe diagram below describes a high level, logical view of the proposed solution. The main components of this solution are as follows and I’ll go through each of them with more details in later sections.
Cassandra cluster (with Metrics-Graphite reporter enabled): source of monitoring metrics Graphite server: receiver and aggregator of Cassandra metrics Grafana server: metrics dashboard provider Apache web server: web host for metrics dashborad Postgres SQLdatabase server: storage provider of received metrics and dashboard metadata
Please note that for simplicity purpose the diagram only shows one Cassandra node, but the idea here can be extended to a Cassandra cluster. Basically the concept behind is that each node within a Cassandra cluster sends Cassandra internal metrics to a central Graphite server (in particular Graphite-carbon sub-component). The metrics are stored, aggregated by Graphite and then displayed via Grafana (a web-based dashboard solution).
Please also note that the web server and database server in the diagram are not necessarily limited to only Apache web server and Postgres SQL database server. Other web servers and database servers that are supported by Graphite can also be used.
3. Configure Cassandra with Graphite metrics reporterSince Cassandra version 2.0.2, Cassandra has provided the built-in feature of Pluggable Metrics Reporting that can expose internal Cassandra metrics on the fly to different metrics reporters such as CSV, console, Graphite, Gangalia, and so on. In this solution, we use the Graphite reporter. Having said so, the solution discussed in this post requires Cassandra version at least 2.0.2. For earlier version like 1.2, a customized metrics collection agent need to be deployed on Cassandra node to collect Cassandra JMX metrics and send it to the target Graphite server.
In order to configure Cassandra service to work with graphite metrics reporter, the following steps are required:
1). Download Graphite metrics reporter jar file (metrics-graphite-2.2.0.jar) from here
2). Put the downloaded jar file in Cassandra library folder, e.g. /usr/share/cassandra/lib/ (the default Cassandra library folder under packaged installation on Ubuntu 14.0.4)
3). Create a metrics reporter configuration file (e.g. metrics_reporter_graphite.yaml ) and put it under the same folder as cassandra.yaml file, e.g. /etc/cassandra/ (the default Cassandra configuration folder under packaged installation on Ubuntu 14.0.4).
graphite: - period: 30 timeunit: 'SECONDS' prefix: 'cassandra-clustername-node1' hosts: - host: 'localhost' port: 2003 predicate: color: 'white' useQualifiedName: true patterns: - '^org.apache.cassandra.+' - '^jvm.+'4). Modify cassandra-env.sh file to include the following JVM option:
METRICS_REPORTER_CFG="metrics_reporter_graphite.yaml" JVM_OPTS="$JVM_OPTS -Dcassandra.metricsReporterConfigFile=$METRICS_REPORTER_CFG"5). Restart Cassandra service
The meaning of of the Graphite metrics reporter configuration file contents is quite straightforward and self-explanatory. Some key contents are:
“ period ” and “ timeunit ” together determines how frequently the metrics are sent out to the target receiver/sink (Graphite server for our case) “ prefix ” can be thought of as a source metric identifier. This is because Graphite is a generic monitoring framework which can receive metrics from different sources. It is therefore a good practice to provide the prefix string with a clear naming pattern. For example, a good pattern for Cassandra monitoring can be something like: ‘ cassandra-cluster.name-node.IP/name ‘. Once we do so, we can easily identify Cassandra metrics on Graphite/Grafana side for a particular node within a particular cluster. “ hosts ” part defines the target Graphite host name/IP and port number (default 2003 if not otherwise changed) “ predicate.useQualifiedName ” specifies whether or not the fully qualified metrics names are used (e.g. org.apache.cassandra.metrics.Compaction.pendingTasks). “ predicate.patterns ” defines the metrics filter, meaning only those metrics with the names matching the specified pattern are sent out to the target receiver/sink. 4. Graphite Monitoring FrameworkThe core part of the solution is based on the generic Graphite monitoring framework which is designed to store, aggregate, and render time-series data. It is a widely used framework and the detail description of it is beyond the scope of this post. In this section, I will briefly touch upon the high level structure of the framework and how our solution fits into this framework.
There are three major components within the core Graphite monitoring framework:
Graphite-carbon is an event-driven networking engine that listens for time-series data Graphite-whisper is a simple storage library to store time-series data Graphite-web is a web application that is built upon python Django web framework and uses Cairo 2D graphics library to render time-series data on demand.Graphite itself does not collect metrics, it relies on other metrics collection software (e.g.Cassandra with Graphite metrics reporter) to send the metrics to it. Once the metrics is sent to Graphite, the event is detected by Graphite-carbon and processed further, such as being aggregated, stored, and rendered on the web.
In the solution as discussed in this post,we use
Postgres SQL database to replace the default embedded SQLite database as the metrics store. It is also used to store Grafana dashboard metadata. Grafana to replace the default “Graphite-web” compon