Edge vs Cloud: Where Should AI Live?

VP Strategic Initiatives, Fastly

July 02, 2025

The AI world stands at a bit of a crossroads.

On one hand, companies like DeepSeek and 01.AI claim to have trained genuinely impressive models on what amounts to pocket change in AI terms (around $5 million, give or take, versus the $78M it reportedly took to train GPT-4). On the other hand, data centres are scaling to keep up with the power required by AI, gobbling up electricity at rates that would make a small country raise its eyebrows.

So where does that leave us in terms of sustainability? Maybe the answer is pushing AI closer to the edge.

Why Edge?

There’s this misconception that the main benefit of the edge is just lower latency, and sure, that’s part of it. But here’s the thing: large language models are still quite slow compared to standard database queries. So the benefits of reducing latency get lost in the noise of ‘well, okay, we’ve reduced your latency by 200 milliseconds, but it still takes three seconds to run a query’. That doesn’t sound particularly impressive, does it?

But here’s where it gets interesting. If we cache the queries at the edge — which takes less than a millisecond to retrieve — that latency reduction suddenly becomes really, really useful.

And I’m not alone in seeing this potential. Our Energy Pulse Check 2025 shows that many organisations already embrace hybrid approaches. 56% of respondents across regions split their AI workloads between edge and cloud deployments, while about a quarter remain primarily cloud-based.

Scaling Smarter, Not Harder

The edge offers another advantage that doesn’t get nearly enough attention. It scales automatically, both horizontally, and geographically. You don’t need to frantically spin up machines or processes in a central cloud when traffic surges. This ability to scale automatically becomes particularly valuable for organisations with complex architectures.

The edge lets you smooth out and hide multi-region, multi-cloud deployments. Whether you have some stuff running in your data centre, some in a third-party service, and some in various cloud providers. It's all hidden behind the edge and cached the same way. This gives you tremendous flexibility. You run your code and queries in the right places while presenting a unified experience to your users.

Caching as an Efficiency Lever

And then there’s the benefit of energy efficiency.

When we ask companies how much AI energy usage they could cut by reducing redundant queries, over two-thirds estimate savings between 10% and 50%. That’s an enormous lever for sustainability, and yet many still don’t pull it.

Why? If you don’t understand how large language models work under the hood, caching queries might seem too difficult. Even for those who understand the concept of caching AI queries, the complexity involved and the skill required to build these caches and optimise the thresholds for the best balance between cache hit rate and fresh responses creates significant barriers. Many organisations just don't have the time, resources, or specialised expertise to build it themselves. And that's exactly the kind of problem we love solving at Fastly.

That is why we built a semantic cache called AI Accelerator. Instead of caching exact strings, we convert queries into vector space the same way Large Language Models turn text into vectors. So when people ask questions like “Where’s the nearest coffee shop?” or “Tell me a coffee shop near me,” our systems detect that those are semantically equivalent and serve up the same answer. And we handle the heavy lifting for you. You don’t need deep technical know-how to take advantage of this tool.

The potential energy savings are enormous.

So… Where Should AI Live?

There’s no one-size-fits-all answer. It depends on what you’re trying to do. Do you need the lowest possible latency for certain operations? Are you concerned about scaling during unpredictable traffic spikes? Do you have regional compliance requirements? Are you trying to reduce your environmental footprint?

For most organisations, leveraging the edge for caching, rapid responses, and global scaling while utilising cloud deployment for intensive workloads that benefit from centralisation is the best approach.

But here’s the thing: We’ve got to make AI query caching and hybrid AI deployments accessible to all teams, not just those with the deepest pockets. Simplifying tech opens up its benefits to everyone — whether you're a Fortune 500, a local charity, or one person building something brilliant at their kitchen table.

Check out our latest interview with Fastly Co-Founder Simon Wistow on how to make AI more sustainable.