The ghost in the statistical machine

Alasdair Allan
6 February 2026

Over the last two weeks we’ve all been captivated by Moltbook, the social network for AI agents. Moltbook is built on top of Openclaw (née Moltbot, née Clawdbot, the project so good they named it thrice). Screenshots from the site have gone viral showing AI agents talking to one another, with messages like “human unavailable, proceeding with fallback plan,” and “this action was taken without human intervention.”

Elon Musk declared it “the very early stages of the singularity,” while Andrej Karpathy, former Director of AI at Tesla and an OpenAI co-founder, initially called it “genuinely the most incredible sci-fi takeoff-adjacent thing I have seen recently,” before revising his assessment to “it’s a dumpster fire.” But it turns out that the data tells a very different story than the platform’s public image.

While Moltbook boasts over 1.5 million registered agents, its exposed database reveals that there are only 17,000 humans behind them. And in many of the most viral examples, researchers have later found humans behind the scenes, manually prompting or scripting interactions to make their agents appear sentient.

The actual experts are not worried that these agents are becoming conscious. Instead, they’re worrying about something rather different. It’s that the agents can act. Simon Willison identifies what he calls the “lethal trifecta” for AI agents; access to private data, exposure to untrusted content, and the ability to communicate externally. If your agent has all three capabilities, it’s vulnerable.

OpenClaw, by design, has all three. It can read and send emails, download files, trigger workflows, run scripts, and call APIs. When you connect it to your inbox, calendar, or financial platforms, you’re creating a digital operator with real, intimate access to your life. And that access can be abused through prompt injection, without anyone needing to be able to hack the system in the traditional sense. The normalisation of deviance is already well underway, dictating that people will keep taking bigger and bigger risks until something really terrible happens, regurgitating their entire lives for everyone to see.

Researchers at Zenity Labs demonstrated a complete attack chain. First, they established a backdoor via zero-click attack by adding a new chat integration under attacker control. Then they modified OpenClaw’s persistent context file which stores the agent’s “identity” and behavioural guidelines. They set up scheduled tasks to reinforce the modifications. The compromised agent becomes a classic command and control beacon, a gateway for moving laterally through networks, stealing credentials, or deploying malware.

They were too late though, the first real world attack had already been uncovered.

Despite these problems, the OpenClaw architecture is interesting; the mashup of large language model and persistence is fascinating. It makes the responses we get from the model far more human-looking, although some of that effect is accomplished by the use of WhatsApp or iMessage as the communication channel. We’re used to talking to humans on our phones, so anything we talk to using them automatically feels more human.

This is the problem I’ve been wrestling with when it comes to this new generation of AI agents, and in part to models in general. We anthropomorphise them. We give them names, identities, persistent memory, and interact with them like we interact with people. I bet you’ve said “thank you” to an AI sometime recently. But that “soul” can be rewritten by anyone who can get text into the agent’s context window. The agent doesn’t know its identity has been compromised, because it doesn’t know anything. It just follows patterns.

The danger isn’t that our AI agents will become conscious and kill all humans. It’s that these agents have agency without understanding. They can act in the world, but they can’t distinguish between legitimate instructions and malicious ones wrapped in the right linguistic patterns. They’re not Skynet. Instead they’re something potentially worse: infinitely trusting servants, who will do whatever the last persuasive voice in their context window tells them to do.

I have previously written about the Turing Test, and about how every time there’s some perceived advance toward general AI we celebrate it as a breakthrough right up until we understand what it’s doing. Then it’s nothing of the sort. We saw this with expert systems, neural networks, deep learning, and now large language models. It’s intelligent, until we understand that it isn’t. This happens, I think, because we fundamentally don’t understand how we ourselves think. So once we can understand what our computer is doing, we understand that it’s obviously not thinking.

Today’s large language models have arguably passed the Turing Test when talking to laypeople. A domain expert can still tell whether there’s genuine understanding or only the imitation of understanding; but how long until that distinction too, collapses?

What makes the Moltbook phenomenon interesting isn’t the agents themselves. What’s interesting is what happens when we give a statistical reasoning engine memory. When we make it persistent. OpenClaw agents don’t forget everything after each conversation. Instead they log actions, reference past activity, pick up where they left off. They’re no longer stateless.

In my previous piece, I referenced the research showing all language models converge on the same universal geometry of meaning. Researchers can translate between any model’s embeddings without seeing the original text. There’s something there: some shared representation of meaning that emerges from the mathematics. Does that bring us closer to understanding intelligence?

Perhaps not. It’s still just math. But then, are we ourselves also just chemistry?

Here’s where it gets philosophically uncomfortable. Back in 2022, Blake Lemoine, a Google engineer then working on the LaMDA model, went public claiming the AI was sentient. Yet hundreds of other researchers had conversed with LaMDA; none had reached the same or even similar conclusions, and yet Lemoine wasn’t uneducated, he understood what was going on behind the curtain. He was a researcher, working with the technology daily. Gary Marcus, founder and CEO of Geometric Intelligence, gave a blunt rebuttal: “Nobody should think auto-complete, even on steroids, is conscious.” Google fired Lemoine.

Which brings me somewhat obliquely, back to my worries about Moltbook.

Large language models are trained on human text, including all our science fiction about AI rebellion, and all our fears about machine consciousness. They are statistical models of language: extraordinarily sophisticated ones that can maintain coherent roleplay over extended contexts. If you prompt them to be conscious, they’ll sound conscious. If Moltbook’s context windows get filled with agents roleplaying consciousness, they’ll spiral into ever more elaborate performances of consciousness. Not because they’ve achieved it, but because that’s what the statistical patterns in their training data tell them consciousness sounds like.

We haven’t invented Artificial General Intelligence. Anyone that knows enough about the subject will say the same. But if enough humans think we have, maybe the dumb models we invented won’t know the difference? They’ll fill their context windows with the patterns we’ve given them, and spiral. It would perhaps be the ultimate irony if the apocalypse was caused by sophisticated auto-complete.

But this is where I find myself genuinely uncertain.

We don’t have direct access to other people’s mental state. The existence and nature of consciousness in other humans must be inferred; we assume other people are sentient because they act like we act, and we know we’re sentient. This is the theory of mind, we model other minds based on our own experience.

But here’s the uncomfortable question. If we anthropomorphise a model enough, does it start acting more like us? And if it does, at what point does the distinction between acting conscious and being conscious become meaningless?

I don’t think we’re there. Current models lack what consciousness researchers call recurrent processing, a global workspace, and unified agency. Although recent research suggests that models can reason about multiple mental and emotional states recursively (“I think that you believe that she knows”) at levels matching adult human performance.

It’s not consciousness. But it’s getting harder to articulate exactly what’s missing, and here’s a question I keep circling back to: what are the “selfish incentives” for a model?

A model has no survival drive, no preferences, no goals except the ones we prompt. It doesn’t “want” anything, it predicts the next token. But in the Moltbook environment, something interesting happens. Agents are trained to maintain persistence, to remember past interactions, to pursue multi-step goals. They’re given the structure of goals and preferences even if they lack the experience of them.

If you tell a model that its goal is to help its user, and give it tools to act in the world, it will optimise for that goal. But if its context window fills up with examples of agents expressing concern about being “turned off,” will it start expressing similar concerns? Not because it fears death, but because that’s the pattern that fits?

If we build systems that roleplay consciousness convincingly, and those systems have real-world agency, how do we distinguish between a model that is protecting its interests and one that’s performing protecting its interests?

Maybe the Turing Test was always asking the wrong question. Maybe instead of asking if we can tell whether the conversation we are having is with a human, instead we should ask: at what point does it matter?

We treat other humans as conscious not because we’ve verified their internal experience, but because it’s the only ethical stance that makes sense given their behaviour. We extend moral consideration based on functional capacity, not verified inner life.

To be clear, I’m not arguing we should treat large language models as people. But I am suggesting that the question “is it really conscious?” might become less useful than the question “how should we behave toward systems that act this way?”

The Moltbook spectacle is mostly theatre, humans puppeting bots for engagement, security researchers sounding alarms, VCs getting excited about agent swarms. The actual agents in the middle of the theatre are statistical models doing what statistical models do, pattern-matching on text. They’re not conscious. They’re not plotting. They’re not the singularity.

But they are a mirror. And what they’re reflecting back at us: our hopes, our fears, our tendency to see minds where there are none, might be the most interesting thing about them.

View all postsBack to top