The Terminal as Agent Interface: Introducing clive

There is a quiet assumption embedded in most agent infrastructure: that agents need structured interfaces. APIs, schemas, tool definitions, protocols like MCP or A2A. The assumption is that the gap between an LLM and a system needs to be bridged by something machine-readable.

clive starts from a different assumption. LLMs are excellent at reading text and reasoning about it. The terminal produces text. So we gave an agent a terminal and a keyboard.

The Core Loop

clive runs an LLM inside a tmux session. The loop is simple:

1. agent reads the current terminal screen
2. agent decides what to type
3. agent sends keystrokes to the pane
4. tool executes, produces output
5. repeat

No schemas. No tool definitions. No structured calls. The agent observes state, acts, observes the result, acts again — the same pattern a developer follows when SSH-ing into an unfamiliar machine and working through a problem.

What makes this work is that tmux separates observation from action. capture-pane gives a clean rendered snapshot of the screen — the agent's perception. send-keys types into the pane — the agent's motor output. The shell state between turns is the agent's working memory.

The Dual Channel

One design decision matters enough to call out explicitly: large content never goes through the terminal.

The terminal is for interaction — commands, short outputs, navigation. Files are for content — anything larger than a screen. When the agent needs to process a large file, the session manager reads it directly from disk and injects it into the LLM context. The terminal never sees it. No scrollback issues, no ANSI noise, no token waste. Shell commands go to the pane. File reads bypass it entirely. The session manager routes based on command type.

CLI as Universal Interface

The deeper point is about interface design.

Every CLI tool that exists — lynx, mutt, curl, whisper, yt-dlp, taskwarrior, sqlite3 — becomes agent-accessible instantly. No MCP server to implement, no API to wrap, no schema to define. The agent reads the output, reasons about it, and responds — the same way a human would.

When the tool changes, the agent adapts. There is no brittle mapping between tool output and structured schema that breaks when the format shifts. The agent reads what's there and figures it out.

This also applies across machines. An SSH connection is indistinguishable from a local shell from the agent's perspective. A restricted shell on a remote server — exposing only the commands you've explicitly permitted — becomes a service the agent can use. The SSH boundary provides authentication, authorization, and auditability through existing infrastructure. No new protocol needed.

The same logic extends to agent-to-agent communication. Any agent that produces text output on a terminal can be driven by — or communicate with — another agent through the same loop. CLI becomes the wire between agents.

What This Replaces

clive does not replace MCP or REST APIs. For programmatic, deterministic, high-throughput tool calls, structured interfaces remain the right choice.

What clive replaces is the need to implement those interfaces for tools that already have a CLI. The category is large: most developer tools, most system utilities, most data processing tools, most communication clients. If it runs in a terminal, clive can drive it.

The terminal was the original computing interface — the way humans operated computers before GUIs existed. It turns out that same interface, running over SSH, observable as a text stream, is also a natural fit for agents.

We built clive to test that hypothesis. The code is at github.com/ikangai/clive.