February 20, 2026 · 9 min read · Engineering

How OpenClaw Works

#Agents #Architecture #Open Source

OpenClaw just passed 200,000 GitHub stars. People are literally buying Mac Minis just to run it. My timeline won’t shut up about it.

So I installed it. Took about twenty minutes to get a working agent inside my Telegram DMs.

Within a few days I was using it for real stuff — summarizing unread emails, pulling up my calendar and giving me prep notes before each meeting, wiring up a webhook so I’d get a Telegram ping whenever someone opened a PR assigned to me on GitHub. All of this happened on my couch, through a chat bubble.

At some point I stopped using it and started reading it.

It’s not the AI

OpenClaw didn’t invent anything new.

The language model — Claude, GPT, DeepSeek, whatever you configure — already existed. Function calling has been in models since mid-2023. WhatsApp libraries like Baileys are mature open source. Telegram bots are trivial to build. WebSocket servers are decades old.

What Peter Steinberger built is infrastructure. Session management. Channel routing. Tool dispatch. Memory. Security.

The model decides what to do. OpenClaw handles how, when, where, and whether it’s allowed to.

That split is what makes the whole thing work.

The architecture

One process. I double-checked. One.

The Gateway is Node.js (22+), bound to 127.0.0.1:18789, WebSocket only. Sessions, routing, auth, tool dispatch, cron, webhooks — all of it lives here.

There is exactly one Gateway per host. WhatsApp’s protocol is strictly single-device, so you can’t split it across machines. Trade-off: simplicity over horizontal scale. For a personal assistant serving one person, this is obviously the right call.

The Gateway does four things:

Routes messages. Every platform gets a channel adapter — WhatsApp via Baileys, Telegram via grammY, Discord via discord.js, 50+ total. Each adapter normalizes messages into a common format. The Agent Runtime sees the same shape regardless of where you sent the message. WhatsApp, Discord, Signal — all identical by that point.

Manages sessions. Conversations persist as JSONL files at ~/.openclaw/agents/<agentId>/sessions/<sessionId>.jsonl. Append-only writes. Process crashes mid-write? You lose one line. That’s your worst case.

Dispatches tools. When the model wants to run a shell command, browse a page, or call an API, the Gateway orchestrates execution — potentially inside a Docker sandbox. Results stream back into the model’s generation.

Enforces security. Allowlists, DM pairing policies, tool permission tiers, and mandatory authentication for non-loopback binding. The days of auth: none are over.

All WebSocket frames are validated against JSON Schema from TypeBox definitions. Typed end-to-end. Whoever wrote this clearly cares about not shipping garbage.

The agentic loop

Okay, this part is where it gets fun.

I sent “check my meetings for tomorrow” on Telegram. Here’s what actually happened under the hood:

The Telegram adapter parses my message and normalizes the format
The Gateway checks my ID against the allowlist and resolves the session
The runtime assembles context: session history from the JSONL file, semantic memory search across past sessions, the agent’s personality from SOUL.md, and all available tool definitions
The full context goes to the LLM. Responses stream back token by token
When the model makes a tool call, the runtime intercepts, executes it, and feeds the result back. Multiple tools can chain in a single turn
The response routes back through the Gateway, through Telegram, into my chat. State gets appended to the JSONL file

And… that’s the whole thing. Message comes in, context gets stitched together, API gets called, tools run, response goes out, state written to disk.

A while loop. Not kidding. The actual loop lives in pi-agent-core — Mario Zechner’s agent runtime that OpenClaw wraps. But what OpenClaw adds on top is the orchestration. Here’s the ts code from src/auto-reply/reply/agent-runner.ts that decides what happens before the model even runs:

// src/auto-reply/reply/agent-runner.ts — the queue/steer orchestration layer

if (shouldSteer && isStreaming) {
  const steered = queueEmbeddedPiMessage(
    followupRun.run.sessionId,
    followupRun.prompt,
  );
  if (steered && !shouldFollowup) {
    await touchActiveSessionEntry();
    typing.cleanup();
    return undefined;
  }
}

if (isActive && (shouldFollowup || resolvedQueue.mode === 'steer')) {
  enqueueFollowupRun(queueKey, followupRun, resolvedQueue);
  await touchActiveSessionEntry();
  typing.cleanup();
  return undefined;
}

The SOUL.md trick

Every OpenClaw agent has a SOUL.md file defining its personality and behavior rules. Not technically impressive — it’s a system prompt on disk.

But the agent has file system access. It can read and write its own workspace. Including SOUL.md.

On the Lex Fridman podcast, Steinberger described making the agent aware of its own configuration. The agent modifies its own personality, adds notes about user preferences, adjusts its behavior rules. On the next turn, the runtime reads the updated file.

Self-modifying software through fs.writeFile().

I thought this was a gimmick until I watched my own agent get noticeably more concise and direct over a week of use. It was remembering my preferences without me telling it to. Just a file on disk, read every turn, and the behavior emerged on its own.

Why this didn’t exist sooner

All the pieces were there. LLM APIs. Messaging libraries. Agent patterns like ReAct and AutoGPT. WebSocket servers since 2011.

Three things had to happen:

Models got good enough. A year ago, they hallucinated function signatures and couldn’t maintain multi-step plans. The current generation can reliably call tools and follow complex instructions without losing the thread.

Someone treated it as infrastructure. Most AI projects treat the model as the product. OpenClaw treats it as a swappable dependency. The product is sessions, channels, memory, tools, security. That’s not exciting until it’s the thing that makes everything work.

Someone shipped it. Steinberger built the first version in one hour on a Friday night in November 2025. Simple glue connecting WhatsApp to Claude. Slow but functional. Through December it grew quietly. By mid-January it had 2,000 stars. Then everything exploded.

On Valentine’s Day he announced he’s joining OpenAI. His reason was blunt: “I did the whole creating-a-company game already, poured 13 years of my life into it.” He doesn’t want to build another company. He wants to build an agent his mum can use, and he thinks OpenAI is the fastest path there. OpenClaw itself is moving to a foundation — open source, independent, MIT licensed. The lobster lives on.

What I took away from reading the source

The model is a dependency, not the product

You can swap Claude for GPT for Gemini for a local model and OpenClaw still works. The value isn’t in calling the API. It’s in what wraps the call: session persistence, channel normalization, tool orchestration, memory, security.

If your product is “we call an LLM API,” you’re one config change from being replaced.

Simplicity scales

No Kubernetes. No microservices. No Redis. No message queue. One Node.js process. JSONL files on disk. SQLite.

The fastest-growing repo in GitHub history is a WebSocket server that appends lines to flat files.

I’ve spent years of my career over-engineering systems for requirements I didn’t have yet. OpenClaw is a useful reminder: the simplest architecture that solves the actual problem in front of you is almost always the right one. Especially now, when coding agents can generate code faster than you used to write it. Every abstraction you add is an abstraction those agents need to navigate. The cost of complexity has never been higher.

Prototypes beat architecture diagrams

The first version was built in one hour. No Gateway design. No spec. Steinberger wired two things together and sent a message.

The architecture came later, through iteration. The Gateway pattern wasn’t designed — it emerged from the requirement to route messages from many platforms to one agent.

In a world where you can have a working prototype in an hour, the prototype is the design process. Build the ugly thing, learn what it needs, clean it up. The structure reveals itself through use.

”Why doesn’t this exist?” is the whole strategy

Both of Steinberger’s major products — PSPDFKit (sold for €100M+) and OpenClaw — started from personal frustration.

He waited from April to November 2025 expecting one of the big labs to build what he wanted. They didn’t. So he built it on a Friday night.

The gap between “this should exist” and “I shipped it” has collapsed. An hour for a prototype. A weekend for something real. A month for a phenomenon.

The job is changing shape

Steinberger told Lex Fridman he thinks “vibe coding is a slur.” He prefers “agentic engineering.” His workflow: dictating prompts to 5-10 agents running in parallel, rarely touching the keyboard.

He shipped OpenClaw — hundreds of files, dozens of integrations, typed WebSocket protocol — as essentially one developer with agents doing the typing.

This is the same shift I’ve been seeing. Less time implementing. More time deciding what to build, how to test it, whether it’s working correctly. More time on system design and security. Less time on details a model handles faster than you can.

Writing code is no longer the bottleneck. Product matters more than ever.

Generalists win

OpenClaw works because Steinberger could operate across WebSocket protocols, messaging APIs, LLM function calling, file system operations, shell execution, browser automation, and mobile development. None of it cutting-edge. All of it necessary.

The integration layer — connecting the model to the messaging app to the file system to the calendar API — is where the product lives. That requires understanding many pieces well enough to know how they fit.

The punchline

I went into the codebase expecting clever AI tricks.

I found a WebSocket server, some JSONL files, and a well-structured event loop.

Claude predicts the next token. OpenClaw does everything else — routing, state, tools, memory. Put them together and it feels like talking to a super intelligence. Pull them apart and you’re looking at a Node.js process that writes to flat files.

The next leap for agents — from “cool demo” to something you’d actually trust with your work — is going to come from people who know how to manage state without dropping data and handle errors without crashing. Boring engineering. Turns out that’s the hard part.

Go read the source yourself — it’s MIT licensed and surprisingly readable. Paolo Perazzo’s architectural breakdown saved me a ton of time navigating the codebase, Steinberger’s own announcement post is worth the three-minute read, and the Lex Fridman episode with him is four hours I don’t regret.

Build things. Ship fast. Stay curious.