In November 2023, Bill Gates wrote an article about how AI would change the way we interact with computers and personal devices through so-called Agents:
“Agents are not only going to change how everyone interacts with computers. They’re also going to upend the software industry, bringing about the biggest revolution in computing since we went from typing commands to tapping on icons.”
There are a lot of blogs and conversations about Copilot-like tools and how they boost the performance of the developers. But in this article, I want to focus on the potential impact of Agents on platform and infrastructure engineering, as it is not really clear where AI fits.
As usual for such articles, we need to go back in time through a memory lane and look at how infrastructure was done in the past and where we are now.
Before the cloud
Long ago, before cloud and platform engineering, infrastructure management was sacred knowledge. There was a need to have multiple T-shaped engineers with deep knowledge of storage, operating systems, networking, databases, and more.
Knowledge was sacred mostly because of the immaturity or even absence of tools to build a proper infrastructure, needless to say, automate some of its pieces. Closer to the end of the 1990s and the beginning of the 2000s, we started to see tools like VMWare, KVM, Proxmox, and even OpenStack. They were a viable solution to standardize the infrastructure and create the first platforms that developers could interact with.
Cloud computing
The 2000s gave us the big three—Amazon Web Services (2006), Google Cloud Platform (April 2008), and Windows Azure (October 2008, later renamed Microsoft Azure). The promise was that there was no need to build your own infrastructure; just use compute resources. Below is a famous picture from one of the AWS presentations, where they compared cloud computing with electricity services.
But it is not the promise itself that we are curious about, but the inception of APIs. Developers and engineers did not need to write complex scripts anymore; they were able to provision infrastructure with a few API calls. OpenStack and VMWare evolved to support similar concepts, but on-premises.
Post-cloud or current-era
This slowly takes us to the present day, where the big three own approximately 70% of the cloud services market, followed by 100+ other service providers. There are also a wide variety of tools used in platform engineering that expose their own APIs. We mentioned VMWare and OpenStack, but one of them is your favorite container orchestration tool—Kubernetes.
With the API driven development and infra management, it was inevitable for middleware tools to appear that interact and standardize this interaction with APIs. Yes, I’m talking about infrastructure as code (IaC). IaC was meant to open the door for standardization of infrastructure management and automation. Remove toil and avoid vendor lock-in. With vendor lock-in, it did not work quite well, as it is still a complex task to move workloads across the clouds. Kubernetes, though, simplifies it a lot, but it is another story.
Adding AI
Today, AI can help engineers boost performance and quality. It can be done through Copilot-like tools, learning, and migrating code from one IaC tool to another. With rapid large language model development and improvements, these things would provide iteratively better user experience over time.
Agents
There are various tools for developers that turn words or even pictures into code without the need to write a single line. There are also attempts to solve an infrastructure component that, instead of code, would provide container images that can be deployed somewhere.
But there are still bits and pieces that need to be solved for infrastructure. For example, databases are still required for any production-grade application. No-code applications try to solve this in their own ways—creating hacky storage solutions or integrating with cloud providers. This might work for early-day startups or quick prototyping, not enterprise-grade service.
The next level of simplification for platform engineers is within AI Agents, which will integrate directly with APIs of cloud providers or other infrastructure tools, like Kubernetes. IaC code will be either replaced or mixed with human language. Now instead of declarative tooling, it will be done through a normal language.
The expectation, though, is that the Agent would be smart enough to operate within various constraints: cost, compliance, stack requirements (hybrid, on-prem), SLAs, etc.
APIs become even more essential for various tools. Without APIs, an AI Agent might not be able to interact with the tool, and hence, market share will shrink.
Day 2 operations
Maintenance is what bothers me with today’s fancy no-code tools. You can create an application using human language, but what is next? How do you maintain the code going forward? The same applies to platform engineering—how do you maintain the infrastructure? For database-specific aspects, how do I know that my data is safe, secure, and consistent?
Agents will become significantly smarter. They will address reliability, security, and performance issues proactively and provide instant insights to the user.
Agents might evolve into the stage where they craft their own solutions using APIs and surprise us. For example, instead of using a Kubernetes Operator (as in the example above), the Agent will craft its own solution relying on basic Kubernetes primitives. In other words, it will come up with its own ways of using building blocks instead of relying on frameworks.
Private clouds
With AI Agents in mind, crafting in-house tooling does not make sense anymore. You might be able to train the model and learn about your own solution and its APIs, but I would not expect a great user experience from it. This is mostly because of the day-2 operations and limitations that have yet to be seen.
There is no doubt that AI Agents in this form will drive further adoption of public clouds. For private clouds, there will be a strong need for a standardized API so that Agents do not need to go down to the operating system level. Kubernetes is already pretty close to that. There are still so many tools in the cloud native ecosystem that have subpar APIs that would drive Agents crazy.
Assist and Replace
As we see with AI now, Agents will operate in Assist mode at the start. They will give a performance boost to the infrastructure engineers. But the smarter they become, the closer it gets to Replace mode. It is inevitable that developers will be able to provision the infrastructure themselves, or it will even be done by higher-level Agents.
This will impact the job market. Cloud and platform engineers, the ones who rely on various APIs and glue them together, will have to adjust. Don’t get me wrong; the future is not that grim, though. Someone needs to create the services that expose these APIs for Agents and power them up. This requires deep technical expertise and fundamental skills. At the beginning of this article, we talked about T-shaped engineers with such skills. They will be needed more than ever.
Conclusion
As we stand on the brink of a new era in platform and infrastructure engineering, the advent of AI Agents promises to revolutionize the field. From the early days of sacred, manual infrastructure management to the modern era of cloud computing and API-driven development, the journey has been marked by constant evolution and innovation. However, the introduction of AI Agents marks the beginning of a transformative phase that could redefine how we interact with and manage our infrastructure.
AI Agents have the potential to integrate seamlessly with cloud providers and infrastructure tools, leveraging natural language processing to simplify and automate complex tasks. This shift from declarative tooling to human language-based interactions will not only streamline operations but also democratize access to sophisticated infrastructure management, making it more accessible to a broader range of engineers and developers.