Last week I found this question about KPIs on #sqlhelp
Anyone have any good KPI’s for DBA’s? Our director wants us to come up with some Key Performance Indicators. Any articles? #sqlhelp
― SS7 (@sqlsurfer7) February 28, 2017
It’s a great question, and there is no right answer here. I made an effort to help narrow the focus by asking for more details. Soon I realized that there was no way this was going to get solved on Twitter, and that a blog post might help. So that’s what I am doing today. You’re welcome.
First, let’s make sure everyone understands that KPIs is an acronym for Key Performance Indicators . KPIs are metrics that help support defined goals and targets.While the specific KPIs will change from one shop to another, KPIs in generalshare some common characteristics:
They reflect strategic value drivers
Are based on valid data
Must be easy to comprehend
Are always relevant
Provide context
Empower users
Lead to positive action
And lastly, while KPIs are metrics, not all metrics are KPIs.
Often the KPI goals and targets are set by the business. But not always. It is perfectly valid for a DBA (or their manager) to assign KPIs to the servers and databases they administer. KPIs are unique for every shop, and for every role. There is no one set of KPIs that will work universally for all DBAs. What I want to do today is share with you some possible KPI metrics that you may consider including.
A couple ofthings to note before we get started. First, whatever KPIs you choose should support goals and targets for your manager. And your manager has goals and targets to support their manager. And upwards it goes until you get to the CEO, who also reports to the Board of Directors. So choose wisely, you want to make sure your efforts to collect and report on these KPIs have value to others.
Second, these KPIs aren’t always for things you are responsible for, but things you are expected to report upon. For example, you may not be in charge of storage, but you are likely to report the amount of disk space available. The reality is that as a DBAthere are a lot of things you are responsible for that you can’t control (hello Sharepoint! *waves*)
Here’s my list. Let me know if there is something I’ve missed and I will add it to the list.
UptimeThis metric floatsto the top of the list for most IT administrators. And it makes sense that you would report onthe servers, applications, and the data as having a high percentage of uptime. I have two issues with using uptime as a metric. First is that no one can agree what uptime means . Oh, sure, we can just say that the system is available implies it is “up”. But if I need to kick users off for six hours on a Sunday in order to do maintenance, is that really uptime? Well, the server isn’t down, so it must be up, right? Wrong. That’s why I advocate using these distinct time metrics: uptime, downtime, and maintenance.
The second issue I have is when people take credit for uptime despite not having any part in keeping the server available. I consider uptime as a KPI to be applicable only when you have personally configured high availability (HA) options. But as a KPI for a standalone server, it may not. Why? Well, since Microsoft is the one that wrote the O/S, and they wrote SQL Server, they might deserve a tad bit more credit about the platforms being stable to the point that it doesn’t need to be rebooted frequently.I won’t tell you to not use uptime as a metric, but I am going to tell you that if you are using uptime then you need to provide more context about the activities you do that directly affect uptime.
BackupsThe number one priority for any DBA is recovery . And you can’t recover without backups. You should be reporting on the number of databases (system and user) that have current backups, where ‘current’ is applicable to your business requirements. You should also report that these backups have been archived to tape. Oh, and it wouldn’t hurt to report if the server itself has been backed up to tape. And yes, I think it is weird that I am talking about tape in the year 2017 but as a DBA I believe in the Computer Backup Rule of Three .
RestoresBecause backups are worthless unless they can be restored, you should report on the success you have restoring random backups. If you don’t know what I’m talking about, go read my post about using statistical sampling methods to help verify your backups are successful.
AvailabilitySimilar to uptime, but also with an idea of recovery, isa KPI for ‘availability’. This would be a measure of the data being available when needed. So, you could have 99.999% uptime but your end users may need to wait for you to restore production to test one day to debug an issue. How long will they wait? That’s the number you want to report on, the amount of time your business end users are waiting for the data they need to become available. This could apply to a lot of things, too, such as ETL loads, or FTPingfiles. For example, let’s say your business needs to recover data from last week, and you have to recall a tape from cold storage. The process could take a day or two, then the restore time. You should track this number.
MaintenanceYou should be performing maintenance on your databases on a regular basis, and thus report on the success of these jobs over time. Chances are your boss, and their boss, have no idea about these jobs. You should raise awareness about activities such as these, as they perform a valuable service to help the business meet their goals (and I bet your bosses would like to report this KPI up the chain, too).
Failed jobsAnother standard for most DBA KPI dashboards is reporting on the number of failed jobs for the day/week/month. I would also include the total number of jobs being executed, in order to give some perspective on the volumeof work being administered. And while we are discussing this, you might want to break the jobs down into individual job steps and report on those, too.
Capacity PlanningCapacity planning is about as much fun as a root canal, except the root canal allows you the opportunity to be unconscious while it happens. I used to get asked for capacity forecasts every quarter and they were never right. Why? Because someone would decide to fill up a database drive with 100GB of XML documents, or they would load 500GB of data into a database, or decide to reorder the columns on a table and fill up a 140GB transaction log drive. The point is that you can forecast all you want but you never know what the end users are going to do next. At the end of the month you compare your estimate to reality and wonder “how come we were so far off?” Users, that’s how come.
Still, there are ways for you to get some info from your databases in order to make an educated guess. And you can have some fun with this, too. For example, you could calculate the amount of space neede