dfxSuccessful cluster administration can be very difficult without a real-time view of the state of the cluster. Solr itself does not provide aggregated views about its state or any historical usage data, which is necessary to understand how the service is used and how it is performing. Knowing the throughput and capacities not only helps detect errors and troubleshoot issues, but is also useful for capacity planning.
Questions may arise, such as:
What is the size of my cluster and each collection? How fast does it grow? What is the query rate on my cluster and collections? How many documents do I have in each collection? What is the performance of my indexers? Are my shards balanced?Answering questions like these requires detailed and historical collection of metrics.
With Cloudera Manager, users have been able to deploy Solr services on CDH and monitor its health since Solr was first integrated. However, the initial monitoring capabilities did not fully answer the above questions in large Solr cluster deployments, often with multi-tenant applications under an SLA being served by CDH and Cloudera Search.
In this post we present the new and improved capabilities available in Cloudera Manager 5.12 to monitor and troubleshoot Cloudera Search clusters beyond just server health. We will demonstrate how to access existing charts and set up dashboards and alerts. But first, let’s review the existing powerful capabilities in Cloudera Manager (CM) that collect rich metrics and allow you to create ad-hoc insight-providing dynamic dashboards.
Metrics in Cloudera ManagerCloudera Manager continuously monitors and collects usage and performance metrics from Solr (and other services running on the shared-storage cluster). The collected metrics are accessible through the Chart Builder feature in Cloudera Manager, where you can build charts and create alerts based on them. Cloudera Manager provides predefined charts with a handful of essential metrics about the cluster’s health that will be demonstrated later in this blog post.
The metrics are collected (and documented) at the service , server , shard , and replica levels, dependending on the nature of the metric. For example, the JVM heap size is a server-level metric, whereas the query request rate is measured at the core/replica level.
Visualizing metricsCloudera Manager already supports creation of ad-hoc queries on collected metrics. The syntax of the query language is SQL-like, making it easy to learn to run custom queries.
We can run custom queries by selecting Chart > Chart Builder from the Cloudera Manager menu. In the Chart Builder interface we enter the query. For example, we can enter:
select select_requests_rateThis query shows the historical request rate of every replica of every collection on every Solr service that is being managed. We can filter these statistics to a specific service or collection:
select select_requests_rate where serviceName="SOLR-1"Or
select select_requests_rate where solrCollectionName="collection1"The filters will select only those replicas that belong to the given service or collection. However, if we want to see an aggregated total of the request rates, we need to use a different approach.
Cloudera Manager creates artificial aggregated metrics for your convenience. The aggregated metrics are summaries of metrics over a certain grouping. For example, the metric select_requests_rate is aggregated into total_select_requests_rate_across_solr_replicas , a sum of select_requests_rate over a shard, collection, or service. We can select the desired aggregation by filtering metrics by category . The example below returns the aggregated select_requests_rate for each shard within the given collection:
select total_select_requests_rate_across_solr_replicas where solrCollectionName="collection1" category="SOLR_SHARD"The following query shows the total select_requests_rate for each collection:
select total_select_requests_rate_across_solr_replicas where category="SOLR_COLLECTION"We can also get the sum of all select_requests_rate for the whole service using this query:
select total_select_requests_rate_across_solr_replicas where category="SERVICE"By using the category filter, we can specify the aggregation level for the metrics. You may want to experiment with other metrics listed in the documentation to find the ones for your specific needs.
You can learn more about tsquery from the documentation .
Predefined charts for SolrIn Cloudera Manager 5.12, we introduce a set of new and improved charts for monitoring Solr services. The Solr service status page in this release contains 8 essential charts:
Request Rate : These three charts are summaries and statistical distributions of select , query , and update request rates. Average Response Time : These three charts display the distribution of average response times for the select , query , and update request types. Index Size : The aggregated index size of the cluster, along with the distribution of index sizes among all cores. Total Documents : The aggregated number of documents, along with the distribution of document counts among all cores.