MLOps and AIOps are two similar-sounding terms that are used to refer to vastly different disciplines within the industry today. Ever since the introduction of these terms a few years ago, zeitgeist interest in them has surged, as this Google Trends chart shows.
And yet, except for a handful of practitioners that are actively working on projects in these areas, for most casual readers, or even enthusiasts looking to explore the space, the meaning of MLOps and AIOps, and their benefits, come across as ambiguous, overlapping, and undifferentiated (relative to each other).
In my experience, there are two reasons for this.
The first is the implicit reference in the words
AIOps to the more widely understood practice of
DevOps. Makes one wonder – are MLOps and AIOps related to DevOps? Do they derive from it? If so, how are they different from it?
The second is the obvious ambiguity regarding how ML is different from AI since they're often used interchangeably. Are they the same? Are they on a continuum? If so, where does one end and the other begin?
Let's formulate these as questions, which we must be able to answer in order to understand MLOps and AIOps.
Are MLOps and AIOps related to DevOps? If so, how?
Since ML and AI tend to be used interchangeably, what does their inclusion in the words MLOps and AIOps imply?
Make a note of these! We will come back to them later in this post.
It's also important to keep in mind the relative infancy of both these disciplines. The terms MLOps and AIOps were coined not more than 6-7 years ago, which means their hype / buzzwordiness factor is currently high, relative to the comprehensibility at large of their semantics, applications, and benefits. This will likely continue for a little while longer until the technology matures, and the use cases become more prevalent and widely understood.
O'Reilly's AI Adoption in the Enterprise 2021 report illustrates this point using this compelling pie chart, which shows that just a quarter of the surveyed respondents said they've mature deployments of AI technology.
Reported hindrances to mature adoption were a lack of skilled people, data quality issues, difficulties in identifying relevant business use cases, lack of company culture, and technical infrastructure issues. The report also found that there is a distinct lack of standardization among tools today for the deployment, monitoring, versioning, and tracking of models and training data.
Given these challenges, it's not surprising that non-practitioners today run into comprehensibility barriers regarding MLOps and AIOps technology, toolsets, and practices.
In this post, I'll clarify what MLOps and AIOps mean, what problems they are meant to solve, and what tools exist for teams that are looking to adopt them into their product and service building strategies.
Before we get into it, though, we must take a quick detour of the DevOps concepts to build context around what it means and what problems it solves. This will help us both better understand the rationale for MLOps and AIOps, and draw clear lines of distinctions between them, later on!
What is DevOps?
DevOps started to become mainstream around 2007 in response to a common organizational problem that affected product teams in their ability to ship software at a brisk pace. Despite following the Agile methodology, it took weeks, if not months, to release software versions and deploy them in production.
The reason for this was that teams that built the software (developers), and the ones that deployed and supported it in production (IT / operations), worked in their own siloes. They reported to different executive leaders within the organization, and worked independently of each other – sometimes even physically on different floors of a building, or in separate buildings.
DevOps was a way to get the developer and the operations teams to collaborate together through every stage of the software development life cycle (SDLC), and to share common objectives and KPIs, so that high-quality software can be shipped much more frequently (often, many times in a single day) using Agile.
At its core, DevOps is about three things:
- Multi-disciplinary skills: A DevOps team collectively has the ability to write, test, deploy, monitor, and manage components of the product stack, including the core code, persistent stores, databases, and any 3rd party libraries and services in use. In the process, siloes are eliminated.
- Tools: Tools assist and accelerate software versioning, automation, and monitoring, so that software can be developed and deployed in a continuous fashion. This is known as Continuous Integration and Continuous Deployment (CICD).
- Processes: DevOps teams follow the Agile methodology to break down roadmap items into smaller milestones and tasks. They use epics and stories as part of sprint planning to allocate work to team members. The tight interlock between the Dev and Ops disciplines ensures everyone is on the same page regarding upcoming releases. It eliminates surprises and speeds up delivery of high-quality products and services.
The DevOps lifecycle has six phases, shown here using the well-known Infinity Loop.
- Plan: In this phase, teams use Agile for defining upcoming sprints and milestones, and finalizing which epics and stories will make the cut.
- Build: Phase where the actual software building happens. Version control tools such as Git play a central role here to help with branch management, code merging, and distributed development.
- Continuous integration and deployment: Known as CICD, this allows teams to make frequent releases through automation of unit and integration tests, and deployments to staging and production environments.
- Monitor: Involves instrumentation for monitoring, tracing, and alerting in order to quickly identify functionality and performance issues, and notify team members.
- Operate: Involves running the product / service in production and includes service configuration, configuration management, and infrastructure monitoring to ensure smooth operations.
- Continuous feedback: Teams gather to evaluate each release and to reflect upon improvements for future releases.
With this context, let's dive into MLOps!
What is MLOps?
MLOps rose into prominence around 2015 with the promise of solving critical operational problems pertaining to the end-to-end delivery of machine learning pipelines, similar to the ones that DevOps had solved almost a decade earlier.
You must be wondering – what are these problems with machine learning pipelines? To make it more tangible, think about what a typical ML pipeline (source: Gartner) looks like.
Three distinct skillsets are necessary to operate this pipeline. First, there's the data pipeline itself, where data is sourced, cleansed, and transformed, and is owned by Data Engineers. Then there's the curation of training datasets, followed by model creation and verification, and is owned by Data Scientists. Lastly, the deployment, monitoring, and ongoing maintenance is owned by Operations.
So we've three teams with specialized skillsets, and they need to coordinate with one another in owning and operating the entire pipeline end-to-end. If these teams operate behind siloes, and cannot collaborate using Agile practices, it will cause shipping delays and quality issues for the overall product.
Recall that these problems are similar to the ones DevOps aims to solve, and they arise when siloed teams with specialized skillsets do not have a tight interlock among them. So in that respect, you can think of MLOps as the application of DevOps principles to machine learning pipelines. Whereas DevOps comprised a multi-disciplinary team of Developers and IT / Operations, MLOps adds Data Engineers and Data Scientists to the mix, and eliminates siloes among them.
The MLOps lifecycle has nine phases, shown here using a modified version of the DevOps Infinity Loop.
Quick disclaimer: I've squashed some of the phases below together in the interest of simplicity.
- Plan: Similar to DevOps, in this phase teams use Agile for defining upcoming sprints and milestones, and finalizing which epics and stories will make the cut.
- Create & Data: Involves data engineering tasks such as data sourcing, extraction, cleansing, and transformation. Also includes training data set creation to be used in the Model phase later.
- Model & Verify: Involves creating machine learning models, training them using the training datasets, and verifying and fine-tuning model parameters.
- Package & Release: Comprises versioning, tracking, packaging and release of the model, model parameters, and the training datasets into pre-prod, and subsequently prod environments.
- Configure & Monitor: Similar to DevOps, involves configuration, and instrumentation for monitoring, tracing, and alerting in order to quickly identify functionality and performance issues, and notify team members.
Hopefully, this clarifies what MLOps means, and how it relates to DevOps. Onto AIOps, next!
What is AIOps?
The term AIOps (Artificial Intelligence for IT Operations) was coined by Gartner in 2016, but unlike MLOps it has almost nothing to do with DevOps! Rather, it refers to the usage of AI / ML techniques and algorithms to automate common, sometimes repetitive, IT tasks.
Before we rabbit-hole further into AIOps, this is a perfect time to revisit and answer the two questions we had asked at the beginning of the post. Doing so will make the rest of this content more comprehensible!
Are MLOps and AIOps related to DevOps? If so, how?
Since ML / AI tend to be used interchangeably, what does their inclusion in the words MLOps and AIOps imply?
With the nagging questions out of the way, let's resume our focus on AIOps. We should ask:
What business outcomes does AIOps target? What are some examples of common / repetitive IT tasks that could be automated using AIOps?
Gartner uses the following framework to define the applicability and benefits of AIOps. Let's drill down!
At its core, just like we concluded, AIOps is about applying machine learning to big data in order to get to the following business outcomes:
- Monitoring: Detecting unusual behaviors from a security, availability, performance, or customer experience perspective, in order to proactively respond to potential issues.
- Service Desk: Automating ticketing tasks, resolving customer issues using intelligent automated chat agents, or answering questions from the knowledge base, for quick and efficient resolution of help desk issues.
- Automation: AI-driven root cause analysis (say, to identify an incompatible library version as the reason for malfunctioning laptops), or predictive insights to be alerted on potential traffic spikes, in order make quick infrastructure resizing decisions.
Hopefully, this clarifies the need for AIOps and how it makes IT and Operations teams more efficient with their common / repetitive tasks!
MLOps vs AIOps Summary
To conclude and summarize this MLOps vs AIOps discussion, here's a handy TLDR version of the main points from this post.
You may have noticed that we didn't talk about any of the MLOps or AIOps toolsets. I intentionally omitted them from the discussion in order to focus on the task of qualitatively evaluating MLOps and AIOps. However, I've included some of the commonly used tools in the table below, so you can check them out for more details if you're interested.
MLOps vs AIOps comparison chart
|Definition||An extension of DevOps principles and practices for operationalizing end-to-end machine learning pipelines.||An application of AI / ML techniques and algorithmsto automate common and repetitive IT tasks.|
|Audience||Multi-disciplinary Data Engineering, Data Science, and Operations teams that need to collaborate on managing machine learning pipelines.||IT teams responsible for infrastructure monitoring, security, service availability, and help desks.|
|Benefits||Makes multi-disciplinary teams efficient, helps track and version training datasets and models, and makes deployments predictable and reproducible.||Makes IT efficient in responding to security threats, making infrastructure sizing decisions, and improving customer experience for help desk engagements.|
|Toolsets||MLflow, Kubeflow, Amazon Sage Maker, Azure ML.||Dynatrace, Datadog, AppDynamics, NewRelic, ServiceNow, Splunk.|
Hope you found this post useful in getting the semantics of MLOps, AIOps, their respective applications, and differentiations clarified!
Hope you enjoyed reading this!
Subscribe to my Data Management Newsletter for more original content on databases, data management and data protection!