ブログに戻る

フォロー&ご登録

英語のみで利用可能

このページは現在英語でのみ閲覧可能です。ご不便をおかけして申し訳ございませんが、しばらくしてからこのページに戻ってください。

AI Agents on Fastly Compute: How it Works and What Makes it Secure

Kay Sawada

シニア・エンタープライズ・サーバーレス・アーキテクト, Fastly

Handling AI workloads on Fastly Compute is not only possible; it is highly effective because it inherently leverages the low latency of the edge and the strict isolation of our environment. In this post, we will break down several patterns to add agent features to your Fastly Compute services using LLM APIs, and explain how our Compute platform, running code in WebAssembly sandboxes, ensures that your agent runs with optimal speed and enterprise-grade security.

Designing feedback loops for an AI agent

This animated gif records a recently launched AI Assistant running on the Fastly Control Panel. When you throw a prompt at an AI-based chat like this, you're probably used to waiting while the AI Agent is "thinking". Instead of test-time compute (TTC) of the model itself, where the model "thinks" more deeply before providing an answer, you can implement this sort of thinking process by building an agent that uses the results of LLM API calls as feedback to achieve a goal. This method is often called an Agent Loop. This process continuously repeats a sequence of steps—such as reasoning, planning, acting with tools, and observing the results—until the objective is successfully completed. 

The execution flow of this feedback loop is illustrated below. While it's often the case that a single exchange where an LLM immediately generates a response to a prompt cannot yield the desired answer, this mechanism allows the agent to autonomously continue the loop of repeatedly executing and evaluating steps (1) through (3), thereby improving accuracy and delivering the final result.

This loop can be deployed in various environments, from local infrastructure to remote servers. Let's look at the processing flow within a remote environment using Fastly Compute through a minimal implementation. For example, the Fastly Compute JavaScript SDK enables this implementation as shown below. Note that this code can be validated by initializing a TypeScript Starter Kit with $npx fastly compute init, replacing src/index.ts with the following code;

addEventListener("fetch", (event) => event.respondWith(handleRequest(event)));

async function handleRequest(event: FetchEvent) {
 let payload = {
   model: 'gpt-5.2',
   messages: [{role: "user", content: "what's the weather today?"}],
   tools: [{ "type": "function", "function": {
               "name": "get_weather",
               "description": "retrieve weather forecast",
               "parameters": {}}}]}
 while(true){ // Agent loop begins
   let res = await (await fetch("https://<LLM_API_ENDPOINT>/chat/completion", {
     method: "POST",
     headers: new Headers({
       "Content-Type": "application/json",
       "Authorization" : "Bearer <YOUR_ACCESS_TOKEN>",
     }),
     body: JSON.stringify(payload)
   })).json();
   if (res.choices[0].finish_reason != "tool_calls") {
     return new Response(JSON.parse(res.choices[0].message.content))
   }
   if (res.choices[0].message.tool_calls != undefined) {
     for (const toolCall of res.choices[0].message.tool_calls ) {
       if (toolCall.type === "function") {
         payload.messages.push({
           role: "user",
           content: "{\"weather\":\"windy and rainy\"}" // pseudo result
         });
       }
     }
   }
 }
}

Execution of this code follows this high-level flow:

  1. Submit the prompt "what's the weather today?" to the LLM API.

  2. The LLM API returns an instruction to execute the get_weather Tool.

  3. Append a dummy value "windy and rainy" as a provisional result to the prompt and re-invoke the LLM API (Agent Loop).

  4. Receive a completion signal from the LLM API (i.e., stop is returned as finish_reason) and display final result.

Running this program in my lab environment using the gpt-5.2 model yielded the following result for step 4: 

"It’s windy and rainy today. If you’re heading out, grab a waterproof jacket and umbrella, and watch for slick roads and strong gusts. Want an hourly breakdown or tips for commuting or outdoor plans?"

The outcome derived from this iterative agent loop exemplifies the feedback loop discussed earlier - a processing technique frequently utilized behind the "Thinking..." indicator in modern chat interfaces.

The Ever-Evolving Agent Loop

Agent Loops have been attracting attention for a while, but this year, the emergence of interfaces to external tools such as MCP (e.g., Fastly MCP server) and tool calls have supercharged them. This evolution is simplified in the diagram below.

A concrete example of Compute code handling processing that includes remote tool calls is provided here. In this example, we use OpenAI's Responses API (announced in May) as an example that allows remote Tool calls, though similar operations are possible with Anthropic's Messages API and others. You can run this code by following the steps below. In about 60 lines of JavaScript code, we summarize best practices for migrating Fastly VCL services to Compute based on document search results, ultimately outputting an editable PowerPoint file (not pdf file).

$ mkdir compute-agent-demo && cd compute-agent-demo
$ npx fastly compute init -l javascript -i
$ npm install pptxgenjs openai hono @fastly/hono-fastly-compute abortcontroller-polyfill
$ curl -s "https://gist.githubusercontent.com/remore/25a1638a3a2183daa609044cfa1ce6f9/raw/818322d634d59c10950878932517c4173b746dd3/index.js" > src/index.js
$ vi src/index.js # Put your API Token and rmeote MCP server address
$ fastly compute serve

An example of the actual PowerPoint file output looks like the following. Leveraging the LLM's ability to summarize information and generate code, we are now able to generate an editable PowerPoint binary file (.pptx) with persuasive content that meets the objective.

A major difference between this code example and the previous one is that there is no while() statement in the program. Since the agent AI loop is executed on the LLM API side this time, the Compute code (acting as the Agent) does not implement a while()loop. This improves program readability and demonstrates how recent advancements have fostered an environment where AI workflows are significantly easier to implement.

const bestPractices = await callLLM(
    'What are the best practices to migrate a fastly vcl service to compute? Outline ten practices and give each a summary of at least 300 characters.',
    [{
      "type": "mcp",
      "server_label": "fastly-doc-search",
      "server_url": "https://xxxxxxx.edgecompute.app/mcp",
      "require_approval": "never",
    }]
  )

As a side note, we implemented this demo by integrating with a remote MCP server running on Compute under the edgecompute.app domain when invoking the callLLM() function, as shown in the code above. While any remote MCP server is compatible, I used an MCP server implemented on Fastly Compute to take advantage of the edge-serverless platform, which is always invoked from the data center nearest to the LLM API. This minimizes latency for tool calls, contributing to the optimization of Time to First Token (TTFT). Fastly Compute's capabilities, including low-latency edge execution and streaming response support, provide powerful backing for your AI Agent development. For details on implementing an MCP server using Compute, please refer to my previous blog.

How Fastly’s Platform and Wasm Sandboxing secure your AI workload

Finally, let's touch on security, which is key to trustworthy AI. Determining what permissions to grant AI agents and how to manage them has been a major concern for developers. While discussions often focus on designing permissive models to accelerate development when using coding agents like Claude Code, Gemini CLI, and Codex, designing enterprise AI workflows requires a perspective on restrictive permission design.

By utilizing Fastly Compute, programs easily benefit from WebAssembly runtime sandbox isolation and memory safety features such as linear memory bounds checking within clear security boundaries. For instance, dynamic code execution using the AsyncFunction() constructor, as implemented in the example above, is generally considered a vulnerability risk and an anti-pattern in many JavaScript runtimes. While it requires careful usage even in Fastly Compute, the platform runs in an isolated WebAssembly environment that inherently lacks file system access, network I/O, or external command execution capabilities, allowing the Agent to perform autonomous processing with a minimized attack surface.

That's not all—the rich Hostcalls provided by the Fastly Compute platform also include diverse security considerations. For example, limits on the number of backend fetch/send calls help prevent the issuance of excessive external requests. Additionally, our "Static Backend" mechanism can restrict traffic to pre-defined external servers, preventing the agent from making HTTP requests to unwanted/unknown external servers.

 // example of dynamic backend
 fetch("https://example.com/some-path") 

 // example of static backend, restricting traffic to pre-defined external servers
 fetch("https://example.com/some-path", {backend: "example-com"})

This mechanism not only safeguards AI workflows from malicious execution but also allows for the selective authorization of broader behaviors via a feature called “Dynamic Backends”. By enabling granular capability assignment to programs running as Wasm Modules within a secure infrastructure, Fastly Compute facilitates the seamless implementation of security in AI workflows.

Securing control by leveraging standardized technologies and specifications

In this post, we introduced several methods for effectively implementing AI Agents on Fastly Compute. AI Agents have become indispensable for driving significant improvements in productivity and operational efficiency. By utilizing Fastly Compute—an enterprise-grade environment that balances performance and security—you can develop Agents that fully leverage the power of AI. I hope this post helps you develop AI Agents with a better experience. Join the conversation in our forum and let us know what you are building.