If there was a way to enable your digital organization to create business value faster and increase revenue, would you do it? Most business leaders wouldn't hesitate to answer a resounding "yes." The last decade has given rise to just such a process, DevOps, allowing software businesses to iterate at a rapid pace, delivering business value faster than ever before. However, organizations need a way to measure that value and to show whether their initiative to adopt DevOps has been successful.
What Is DevOps And Why Is It Important?
Defining what DevOps is involves exploring areas relating to organizational culture, engineering, and tools. Amazon Web Services (AWS) provides an excellent, succinct explanation on its site:
DevOps is the combination of cultural philosophies, practices, and tools that increases an organization’s ability to deliver applications and services at high velocity: evolving and improving products at a faster pace than organizations using traditional software development and infrastructure management processes.
DevOps should enable a continuous, looping "flow" of feedback between development and operations teams:
Figure 1: Continuous flow from dev to ops
Past development models often resulted in a logical wall between development and operations teams, which passed work back and forth over the wall but with little context or knowledge transfer. As software projects grew in complexity, and market demand required faster iteration, these models did not scale. DevOps methodologies and patterns revolutionized software engineering at-scale, bridging operations and development and converting Byzantine, manual processes into streamlined automation.
Put simply, DevOps allows end-to-end software engineering to happen faster. Features get to customers quicker, more business value is generated, and ultimately, revenue grows. To know if you are successfully implementing DevOps, you can measure KPIs, which we explain in the next chapter.
Identify and Collect KPIs
Key performance indicators (KPIs) represent a kind of measurement or metric that indicates whether a team or initiative is successful. KPIs generally need to align with a specific goal, be measurable, and fit within a time-box. In the case of DevOps, KPIs should provide quantifiable data relating to objectives like deployment velocity and performance.
Here, we'll discuss five of the most critical KPIs, and move on to explore how they can be integrated with reporting and visualizations.
1. Deployment Frequency
If you have to choose only one KPI to collect for DevOps, it should be deployment frequency. If the primary goal of DevOps is to increase development and deployment velocity, then the primary metric of success is how frequently deployments take place. As teams begin to adopt DevOps, frequency should increase. Any drop is indicative of a bottleneck or choke-point somewhere in the process.
A simple way to measure frequency could be to configure a basic webhook on the final stage of your CI/CD pipeline or deployment tool. Any successful deploy should trigger the webhook, incrementing the number of deployments. With highly complex deployment environments, it might be necessary to make the metric more granular, for example, by region or by customer.
2. Change Lead Time
In a successful DevOps culture, the fast feedback flow between operations and development should enable any changes to happen quickly, correctly, and efficiently. New features, patches, bug fixes, and security remediations should all be able to proceed from creation to delivery with minimal lead time. Longer lead times are again indicative of a bottleneck.
To effectively measure change lead time, you’ll need to successfully implement at least two other methodologies from DevOps and Agile:
- A unified work backlog for dev and ops, which provides a single source of truth on the status of all work items for a particular application or service.
- Single-ticket/feature deploys to make sure that deployments are isolated so that they can be rolled back if needed.
Once these are implemented, any change is tied directly to a uniquely identified ticket or work item. The deployment system (such as a CI/CD pipeline) reads the ticket identifier, and using webhooks, it can automatically close any work item that it successfully deploys. Then it's simply a matter of measuring the total time elapsed between the "new/open" and "closed" status of all work items.
3. Defect Volume and Escape Rate (Error Budget)
The core metric we're worried about is the number of defects. The defect escape rate is the ratio of defects found by the customer or user in production, to bugs found during testing by QA in pre-production. This metric gives insights into the quality of your software deployments and the effectiveness of the test coverage by your QA team.
Note that aiming for 100% defect-free operation can actually be counterintuitive and may result in DevOps anti-patterns emerging, such as hesitation to make changes or lagging on the delivery of significant feature upgrades. For sane error budgets, bake some flexibility into your SLAs, which we discuss in the next section.
A service level agreement (SLA) can be soft, like an idealized goal, or conversely, a legal, contractual obligation to maintain some level of availability over a given time window. All stakeholders should be engaged in determining what exactly constitutes uptime versus downtime, and then come to an agreement on an effective way to measure it.
A rudimentary method might include just pinging a service endpoint or page to see if it responds with HTTP status code 200. Large-scale distributed systems might require more elaborate monitoring, with a variety of hosts and load balancers consistently reporting health/availability metrics to a centralized aggregator.
5. Application Performance
Application performance is something that might be included in the SLA from the previous section. For most web-facing applications, performance is measured with metrics such as time to first byte (TTFB), error rates, and response time. These are all easily quantifiable and are direct indicators of the user experience that your service or application is offering.
Measuring application performance is a tricky proposition, as it’s nearly impossible to simulate all the potentially unique combinations of network paths and hardware a particular client might be utilizing for a given session. However, blackbox monitoring can be an effective tool to help get a good measurement. Unlike the synthetic monitoring that occurs inside the architecture, blackbox monitoring has no knowledge of the interior metrics or design. It typically acts as a simulated user agent, sampling metrics from outside the application and measuring the response time a user might experience.
Reporting and Visualizing KPIs
With a refined list of KPIs, a solid SLA, and the monitoring and collection infrastructure in place, you've got the data to prove whether DevOps has successfully delivered on its promises. However, technical and non-technical stakeholders across the business are likely to have an interest in some or all of the metrics. The question remains: How do you effectively present this data?
Fortunately, there is a rich ecosystem of tools to help report and visualize KPI status. An important goal in DevOps is to make the work "visible." While it may be obvious that everyone is doing work, the contextual importance and overall status of any given work item will likely be difficult to ascertain without an easily accessible system of visualization and reporting.
Kanban boards are a tool from the Agile world, with roots in lean manufacturing. They provide an excellent way to break down a seemingly complex project into easily digestible pieces of data. A Kanban board available to everyone in your organization allows stakeholders to get a visual understanding of the status of work in a short amount of time. Ticketing software such as Jira comes with Kanban functionality built in, while tools such as Trello or Asana could be integrated with existing workflows.
Most of these tools also provide functionality to automatically email status reports at chosen time intervals, keeping everyone up to date. For more ad hoc status queries, tools like Slack offer a variety of integrations enabling "ChatOps," providing interactive status engagement via bots and reporting tools.
For more quantifiable KPI metrics, a variety of dashboarding solutions are available. Tools like Datadog, Grafana, and Tableau offer comprehensive solutions for visualizing a broad variety of business and technical performance metrics. Most of the solutions have a number of integrations with other SaaS platforms, tools, and APIs.
For KPIs that involve SLAs and application performance, you should sample this data directly from production systems. Visualizations can also be built around lead time and deployment velocity. This way you can package all of these figures into easy-to-use dashboards that you can distribute to the stakeholders.
DevOps Delivers Business Value
By now, it should be obvious that DevOps has the potential to deliver an immense amount of business value. In addition, organizations with a strong, effective DevOps culture will be able to attract highly capable engineering talent, reinforcing the feedback loop of driving increased value. The key to realizing the value of DevOps is to make it measurable and visualize it, to ensure all stakeholders can easily and effectively see what’s driving it.
However, for DevOps to have a chance at success, both technical and non-technical teams need to be in alignment. Non-technical stakeholders need to buy in, collaborating with engineering teams to help craft meaningful, achievable KPIs and SLAs. Technical teams need to design systems and processes that streamline software delivery and provide solid tools for communicating the value that DevOps is delivering to their organization.