IT Operations / By Seema Sheth-Voss Here at Devo’s Cambridge, Mass., office, we’ve been steeped in news of national sports league playoffs for several weeks. The games are great, even with the stress and uncertainty of overtime, but it’s gotten me thinking about the professional hockey and basketball players, and how they’ve become as successful as they are. Sure, there’s an element of natural skill, but what we don’t see on the surface is that each one of Boston’s professional athletes took on rigorous training and painstaking attention to detail to ascend to the next level of achievement. Behind the scenes are countless practices, drill and skill repetitions along with the mental and team preparation. There are no shortcuts or skipped days. The basics matter; they have a checklist and they stick to it. Can we say the same about modern IT teams sticking to the basics? Perhaps. Often, the underlying challenge they face is that organizations aren’t focusing on a given business question or problem at hand – usually related to digital transformation – and speed or raw genius alone doesn’t guarantee success. Businesses need to adhere to a clear checklist. For instance, in the rush to dramatically improve customer experiences, companies broadly implemented new applications, development styles, and technologies like AI, to solve their problems without having defined what their specific problem is. Consider this: if a call center experiences high dissatisfaction, the problem could lie in many areas, but instead of looking within the data to surface a single problem, we’d see numerous new applications, technologies, or processes thrown at the call center to see what lowers the dissatisfaction among customers. This multitude of problems and solutions means it’s time to start thinking about Gartner’s criteria for AIOps as a checklist. AIOps is the application of machine learning and data science to IT operations, and its primary goal is to reduce mean time to resolution. Of course, the ultimate goal is to prevent problems from impacting end-users in the first place! The challenge for many is that AIOps is a pipe dream; the realistic deployment of AI to solve all problems in any given organization hasn’t panned out. But when we take a data-first approach, and start with four simple questions, we can derive value and deliver insight that the business cares about. What is happening? What does it really mean? How quickly can we respond? How can we reduce effort and spend of operations? Follow the checklist to determine the steps you can take to stand up an AIOps platform. The AIOps Checklist 🡪 Gather logs, metrics, and events from multiple sources. Machine data – events, metrics, text strings, wire – is scattered everywhere, in the cloud, on-premise, and in-between. Full-stack visibility is key for analysts to understand what’s happening in these complex environments. Pick a vendor-agnostic platform that plays well with everyone else in your data ecosystem. 🡪 Enable both real-time and historical data analysis. AIOps needs the best of both data worlds to work in concert. Picking just one isn’t an option. Next best action recommendations are only impactful when you have both real-time as well as full history about a user or entity at hand. Don’t settle for performance penalties from warm or cold buckets. Look for architectures which parallelize analytics across both real-time and historical data. 🡪 Provide access to the data. Data transformation and abstraction slows everyone down. In today’s world of cloud, hybrid, on-premise operations data is everywhere and getting from point A to B shouldn’t be a challenge. Data should be immediately accessible – regardless of location and type. A cloud-scale platform that keeps the data raw and tagged in the simplest format is the fastest data access mode out there. 🡪 Store the acquired data. Retention rates matter. Check your vendor’s SLA and price-kickers. Don’t settle for horizons such as 90 or 180 days. Today’s AIOps are built on cloud-scale data platforms where unlimited storage and compute is the norm. 🡪 Use machine learning to detect problems and isolate root cause. Anomaly detection is the process of identifying and alerting to abnormal behavior. When it comes to large-scale time-series, machine learning algorithms are great at identifying anomalies, continuously filtering and prioritizing the most relevant alerts. Use ML to help operations sift through a barrage of false positives without losing focus of what’s important. The goal? Achieve faster mean time to investigation, to lower the mean time to resolution. 🡪 Initiate action or next step based on meaning of analysis. Goodbye high-stress war rooms. Organizations need to be able to incorporate streamlined remediation to more easily turn insights into action. AIOps solutions need to pave the way for proactive detection, automated triggers for known issues, and workflows for fast remediation. AIOps: Transforming Customer Experiences It’s interesting that the first four of the six criteria are about getting a handle on the data before addressing analytics or remediation. According to Gartner, large enterprises’ exclusive use of AIOps and digital experience monitoring tools to monitor applications and infrastructure will rise from 5% in 2018 to 30% in 2023. This scale of shift will be transformative. Much like Boston’s title town status, AIOps takes strategy and work. It requires going back to basics, but always iterative process with achievements along the way. Within your own organization, work to achieve each item on the checklist to create predictable, positive customer experiences every time.