How to do AI engineering right

April 9, 2026

I was invited to speak at this year’s AI Engineer Europe conference on the topic of how to do AI-native engineering in big tech companies. Sadly, my extremely peculiar employer refused me permission to speak. So you’re reading this instead of watching a talk.

If you’re trying to do meaningful, novel AI work inside a big company (not just Big Tech, with capital letters), it’s difficult because you’re being squeezed from two sides simultaneously. I’ve seen teams respond in a way that makes things worse. Here’s what I think actually works.

The squeeze of doom

From below, the models are eating your value added. Every capability you painstakingly assemble on top of a base model risks being rendered obsolete weeks or months later by new models. At the start of 2025, I built a cool RAG system using knowledge graphs. Agents were pretty primitive and spent ages meandering around large codebases trying to find the right context. The knowledge graphs did all the exploration ahead of time so agents could query where to look and get the correct locations within a few seconds.

That was super useful, until agentic search got better and the agents stopped needing the knowledge graph tool.

And that’s a good thing! It means less infrastructure to build and maintain. But it shows that you don’t want to be competing against the models and the AI labs’ agent harnesses. Just as the Bitter Lesson showed that the GPUs-go-brrrr approach outperforms fancy techniques and architectures, so too do we see that better models and agent harnesses will replace bespoke infrastructure. The AI labs and wider open source community have tens of thousands of people contributing to these tools. If you’re competing against them, you don’t stand a chance.

“Yes, of course, Tom, everyone doing AI engineering has to deal with this,” you say.

That is true. The extra squeeze dimension you get working at a large company is the central platform teams building out all of the company-wide agentic infrastructure. Maybe you even have platform teams for every level of your organisation’s hierarchy, each owning their org’s remit. Things like remote agent environments, authentication systems, and internal tooling integrations are going to be owned by these teams. They have the advantage of scale, funding, and organisational mandate.

They’re probably slower than you and so it’s tempting to build your alternatives while you wait. But they’re going to do a better job and their solutions are going to become the blessed approach. Whatever you do will not be durable. Sure, you can ship a proof-of-concept while the central team is still writing planning documents. But then the central team ships and now you maintain an orphaned system.

Data not tools

The squeeze leaves you trying to think of stuff to build in the narrow space not served by either public tooling or internal platform teams. The mistake here is to think of AI engineering as “what cool tools or capabilities can I build?” This leaves you struggling to carve out a bit of space where you have the competitive advantage.

The danger is that you end up competing against the models.

But reframe the problem as “how can I get the most out of our data?” and you have the natural advantage. Only you have access to your data (hopefully!) so no-one else can outcompete you. Tools and capabilities fade to their proper, secondary position as mere means to ends.

What developers always forget is this: the goal is not to build cool shit. The goal is to create business value. If you can’t draw a line from your work to something the business cares about — engineering productivity, incident frequency, release quality, user retention — you’ll struggle to sustain support for it.

Assume that the models will get better and the open source tooling will improve. But nobody else has your repositories, your application logs, analytics, or your team’s commit history. That’s your moat. AI engineering is not finding some excuse to build fancy agent setups. Your job is to use whatever tools and techniques are available to extract the maximum business value from the data you control.

This reframes the relationship with everything external. When the models improve, that’s now good news. Your workflows will perform better! When the central platform team ships a better agent execution environment, great, that’s another thing you can incorporate. Instead of competing with these developments, you’re multiplying on top of them. Whenever a new model or tool drops, you should feel excited by the possibilities it opens, not worried that they’ll make your work obsolete.

Here’s a simple rule of thumb. Ask yourself: does this project require access to our specific data? If no, someone else will build a better version. If yes, that’s maybe something worth building.

Things you can actually do

Here are the data sources every engineering team has and almost certainly isn’t fully exploiting.

Repository history and code

Your codebase’s git history is a record of every mistake, every fix, every pattern your team has developed over time. You know things about your codebase that no model does: which abstractions are shaky, which files people don’t like to touch, which section is like that just because one person got really into functional programming for a month.

If your team is in any way competent at writing commit messages, your repo’s full commit history is a deep source of insight. Mine it for fix commits to identify common bugs that keep recurring and extract rules from how they were fixed. Use this to improve code review automation to be not just generic style linting, but rules derived from your team’s actual failure patterns.

Reducing the time engineers spend on review is an obvious productivity win that compounds. Better review catches more issues earlier, before they become incidents. New contributors have a way to learn all of the conventions and patterns that you almost certainly haven’t bothered to document clearly.

Even if your commit history is a wall of “make changes and fix” messages, you can still use git logging commands to uncover statistics about the codebase: which files change the most, who has their commits reverted the most, and how conventions have changed over time. A simple way to start is to ask an agent to plough through the commit history and see what it can uncover. Get it to teach itself the evolution of your codebase.

Application logs

Most teams look at logs reactively: something breaks, you look for the error. AI opens up the capability to continuously analyse vast reams of logs to catch crashes and errors before anyone files a ticket. Even better, cross-reference the log patterns against the codebase and commit changes. There is nothing stopping you from building a system that surfaces “the user entered this flow, which caused them to hit this new code path and emit a stack trace which maps to this function, changed in this commit, by this author, who also changed these other files” before the on-call engineer has even opened their laptop.

Metrics and product analytics

Alarms and dashboards are table stakes these days, but tend to be pretty high-level. Which is understandable, because you don’t want to get paged just because five users in a particular customer segment experienced a brief issue. But AI doesn’t care about being paged! The opportunity here is to do much more granular tracking of metrics and user behaviour with analytics to identify regressions and other interesting behaviour. This doesn’t even really need “AI” per se so much as good data collection and statistical analysis, but AI is good at digging into the data dumps and finding useful things. Computer use agents open up new possibilities to really explore your application rather than rely on defined integration testing flows.

As a test scenario, imagine that you release a small UI change that causes text to overflow in the middle of a conversion workflow, but only when the user has the language set to German. There are no errors logged, just this user segment becomes slightly less likely to convert. Would you be able to detect this problem right now? If not, how many more little niggles and annoyances do you think might be lurking in your codebase?

Durable engineering

None of this is particularly fast or flashy. You won’t hear people talking on AI engineering podcasts about digging through commit histories. Lots of the work is connecting data pipes together so that agents can access the data. Building reliable log analysis pipelines isn’t the kind of thing that makes a compelling conference demo. (Which may partly explain why I ended up writing this as a blog post.)

But it’s durable. The data keeps accumulating and the models keep improving. Perhaps someone builds an open source version of one of these ideas that integrates cleanly with your data sources. All of this is great! Because your goal is not to build shiny AI tools, but to use AI to generate business value. And that is what AI engineering is all about.

Previous Every tech and IT company in Sheffield