Last week, I dropped an LLM agent into a plain HTML file. No Node.js. No Webpack. No Docker container running a Python backend that proxies to another Python backend. One <script type="module"> tag, a couple of imports, and an agent that reasons, calls tools, writes code, and delegates to sub-agents — all orchestrated entirely client-side, in the browser. The inference still happens on a remote model server. That's unavoidable. But everything between your user and that API — the reasoning loop, the tool dispatch, the sandboxed code execution, the multi-agent coordination — runs in the tab. No middleware. No backend of your own.
The entire framework behind it is 479 lines of JavaScript. I wrote it to prove a point.
The Framework Was Never the Product
Here's the state of agent frameworks in 2026: you pip install something that pulls in 400 transitive dependencies. You configure YAML files. You set up vector databases, message queues, tracing backends. You write "agent" code that's really just glue between seven different services, none of which run in the same process.
Then your agent calls a tool.
Behind the scenes, a request goes from your Python process to a local HTTP server that serializes a tool schema, sends it over REST to another service, which deserializes it, runs a function, and sends the result back the way it came. All of this to call get_weather("Berlin").
The problem isn't that these frameworks are bad. Some are excellent. The problem is that for the core abstraction — an LLM that reasons in a loop and takes actions — you don't need any of it.
What If the Agent Ran Where the User Already Is?
The browser is a runtime. A good one. It has fetch. It has postMessage. It has iframes with configurable security sandboxes. It has an event system. It has a module loader. Everything you need to build an agent framework already exists in the platform.
smolagents-js takes this seriously. Inspired by Hugging Face's Python smolagents library, it implements the full ReActpattern — Think, Act, Observe — in pure browser JavaScript with zero npm dependencies.
Not "zero production dependencies." Zero dependencies. No package.json runtime section. No bundler. No build step. You do need a modern browser with ES module support and an OpenRouter API key for model access — but the framework itself ships nothing you didn't write.
<script type="module">
import { ToolCallingAgent, Model, tool } from './src/index.js';
That's the setup.
479 Lines. Let's Talk About What's in Them.
The architecture is absurdly legible. Ten files. Each one does exactly one thing:
| File | Lines | What It Does |
|---|---|---|
events.js |
20 | An EventEmitter. The whole thing. |
tool.js |
33 | Tool definition + OpenAI schema conversion |
model.js |
36 | HTTP client for OpenRouter |
prompts.js |
37 | System prompt templates |
managed-agent.js |
36 | Wraps an agent as a tool |
code-agent.js |
43 | Agent that writes and runs JavaScript |
tool-calling-agent.js |
43 | Agent that picks and calls tools |
agent.js |
105 | The ReAct loop |
sandbox.js |
119 | Sandboxed code execution via iframe |
index.js |
7 | Public exports |
You can read the entire thing during a coffee break. Not skim — read. Understand every control flow path. Trace every event. That's the point.
The Loop
Every agent framework, no matter how many abstractions it stacks, eventually bottoms out at the same loop. smolagents-js puts it right where you can see it:
while (this.currentStep < this.maxSteps) {
this.currentStep++;
// THINK — ask the LLM what to do next
this.emit('think', { agent: this.name, step: this.currentStep });
const response = await this.callModel();
// Is it done? If no action, we have our answer.
const action = this.extractAction(response);
if (!action) {
this.emit('done', { result: response.content });
return response.content;
}
// ACT — execute the tool or code
this.emit('act', { step: this.currentStep, ...action });
const result = await this.executeAction(action);
// OBSERVE — feed the result back into the conversation
this.appendActionToHistory(response, action, result);
this.emit('observe', { step: this.currentStep, result });
}
That's agent.js. The base class. Two subclasses override extractAction and executeAction to implement the two strategies:
ToolCallingAgent parses OpenAI-style tool_calls from the LLM response, finds the matching tool, calls it, and appends the result as a tool message.
CodeAgent extracts JavaScript from the LLM's markdown code fences, injects tools as async functions, and executes the code in a sandbox.
Same loop. Different interpretation of "action." The LLM doesn't know or care which one it's inside.
The Sandbox Trick
The CodeAgent needs to execute LLM-generated JavaScript safely. In a Node.js world, you'd reach for vm2 or a Docker container. In the browser, you already have something better: the iframe.
this._iframe = document.createElement('iframe');
this._iframe.sandbox = 'allow-scripts';
this._iframe.style.display = 'none';
document.body.appendChild(this._iframe);
Three lines. The sandbox="allow-scripts" attribute creates an execution context with no access to the parent DOM, no network access, no storage, no nothing — except the ability to run JavaScript and talk to its parent via postMessage.
Tools become message-passing stubs inside the iframe:
async function get_weather(args) {
parent.postMessage({ type: 'tool_call', tool: 'get_weather', args }, '*');
return new Promise((resolve) => {
window.__pending[callId] = { resolve };
});
}
The agent writes code that calls get_weather(). The iframe sends a message to the parent. The parent executes the real tool. The result travels back through postMessage. The Promise resolves. The code continues.
No HTTP. No serialization format debates. No container orchestration. Just the platform doing what it was designed to do.
The Case for Reading the Source
There's a reason this post quotes so much code. Most agent framework documentation teaches you the API. It tells you which function to call, which config to set, which decorator to add. When something breaks, you're left searching GitHub issues.
smolagents-js is small enough that the source is the documentation. Every abstraction is one file. Every file fits on a screen. When you hit a bug, you don't google — you read agent.js and know exactly where the loop is, how the history accumulates, why the model saw what it saw.
This matters more than people think. Agent debugging is hard not because the bugs are subtle, but because the systems are opaque. Last month I spent two hours tracing a tool-call failure in a popular Python framework. The model was receiving a malformed tool result, but the serialization happened across three abstraction layers, two middleware hooks, and a callback registry. The actual bug was a missing str() call. In smolagents-js, tool results go from executeAction to appendActionToHistory to the message array. One file. One path. You'd find that bug in minutes.
Agents as Tools. Tools as Agents.
The multi-agent design has a kind of recursive elegance. A ManagedAgent wraps any agent and exposes it as a Tool with a single parameter: task.
From the manager's perspective, a sub-agent is just another tool. It takes a string, it returns a string. The fact that behind that interface there's an entire ReAct loop running — another LLM thinking, acting, observing — is invisible. The manager doesn't need to know.
Here's what that looks like with a real task — a manager coordinating a researcher and a calculator to answer "How many football stadiums would it take to seat the population of Tokyo?":
const researcher = new ToolCallingAgent({
model, tools: [webSearchTool], maxSteps: 5
});
const calculator = new ToolCallingAgent({
model, tools: [calculateTool], maxSteps: 5
});
const manager = new ToolCallingAgent({
model,
managedAgents: [
new ManagedAgent({
agent: researcher, name: 'researcher',
description: 'Searches the web for factual information'
}),
new ManagedAgent({
agent: calculator, name: 'calculator',
description: 'Evaluates math expressions'
})
],
maxSteps: 8
});
const answer = await manager.run(
'How many football stadiums would you need to seat the population of Tokyo?'
);
The manager decides to call researcher({ task: "population of Tokyo" }). The researcher spins up its own ReAct loop, calls web_search, gets results, formulates an answer: "13.96 million." The manager takes that, calls calculator({ task: "13960000 / 80000" }). The calculator runs its loop, evaluates the expression, returns "174.5." The manager synthesizes: "About 175 stadiums."
Three agents. Six reasoning steps total. Each one fires events as it works, and they nest cleanly:
[manager] Step 1: thinking...
[manager] Step 1: calling researcher({"task": "population of Tokyo"})
[researcher] Step 1: thinking...
[researcher] Step 1: calling web_search({"query": "Tokyo population"})
[researcher] Step 1: result = Tokyo metro population: ~13.96 million
[researcher] Done in 2 steps
[manager] Step 2: thinking...
[manager] Step 2: calling calculator({"task": "13960000 / 80000"})
[calculator] Step 1: thinking...
[calculator] Step 1: calling calculate({"expression": "13960000 / 80000"})
[calculator] Done in 2 steps
[manager] Done in 3 steps
The indentation comes from a depth counter that increments each time an event bubbles up through a ManagedAgent boundary. The UI hooks into this for free — each depth level is a CSS indent. The whole orchestration layer is 36 lines of code, and it enables arbitrarily deep agent hierarchies because the composition is just function calls all the way down.
What It Doesn't Do
Honesty time. There are things this framework deliberately doesn't handle, and things it can't handle because of where it runs.
No persistent memory. The agent's history lives in a JavaScript array. Close the tab, it's gone. There's no built-in conversation persistence, no vector store, no retrieval-augmented generation. If you need memory across sessions, you'd wire it up yourself — localStorage, IndexedDB, an external API. The framework gives you the message array; what you do with it is your problem.
No streaming. The model client does a single fetch and waits for the full response. For long-running steps, the user sees "thinking..." until the model finishes. Adding streaming would roughly double the complexity of model.js and ripple into the ReAct loop. It's the most obvious missing feature, and the most likely next addition.
No retry logic or rate limiting. If an API call fails, the agent throws. There are no exponential backoffs, no circuit breakers, no automatic retries. For a production deployment behind a real product, you'd want a wrapper around the Modelclass. For prototyping and demos, the simplicity is the right trade-off.
No authentication flows. The API key is a string you pass in. There's no OAuth dance, no token refresh, no key rotation. For a browser-first library, this is a real limitation — you either ask the user to paste their key (which the examples do) or you proxy through your own backend (which defeats part of the point).
Browser-only. This won't run in Node.js without modification. The sandbox depends on document.createElement('iframe'). The model client uses the browser's fetch. If you need server-side agents, this isn't for you — and that's fine. The bet is that a lot of agent use cases belong in the client, not the server.
These are real constraints. Some are solvable within the current design. Some are fundamental trade-offs of the "zero dependencies, browser-only" approach. The question is whether the things it does do — tool calling, code execution, multi-agent orchestration, a legible ReAct loop — are enough for your use case. For prototyping, teaching, and lightweight client-side agents, the answer is yes. For production systems with thousands of concurrent users and regulatory requirements, obviously not.
But 479 lines that do the right things well is a better starting point than 50,000 lines you'll never fully understand.
The point I wanted to prove wasn't that small is always better. It's that the core of an agent — the loop, the tools, the composition — is a small idea. When a small idea gets buried under a large system, you lose the ability to reason about it. You can't debug what you can't read. You can't extend what you don't understand. The frameworks will keep growing. The idea doesn't have to.
Unlock the Future of Business with AI
Dive into our immersive workshops and equip your team with the tools and knowledge to lead in the AI era.