The Wizard Problem
Literacy: Foundations
In 1966, Joseph Weizenbaum created ELIZA — a simple pattern-matching program that mimicked a therapist. His secretary, who knew it was a program, asked him to leave the room so she could speak to it privately. The mechanism hasn't changed. The programs have gotten much better.
Why humans apply social cognition to AI, what that does to trust and deference, and what a workable mental model of how generative AI actually works looks like.
1966
The number behind this guide
The ELIZA effect is nearly sixty years old — and still running.
Weizenbaum's secretary knew ELIZA was a program. She still asked him to leave the room so she could talk to it privately.
Two wrong models, neither one workable.
Users oscillate between treating AI as a wizard — infallible oracle, intentional agent — and treating it as a parlor trick — shallow autocomplete, not worth engaging seriously. Both failures produce systematic errors.
The Wizard error (over-trust)
- •Treating confident-sounding outputs as factually authoritative
- •Assuming AI 'knows what you mean' and produces intended outputs
- •Deferring to AI on decisions requiring clinical, legal, or domain expertise
- •Attributing goals, intentions, or understanding to the system
The Dismissal error (under-trust)
- •Refusing to engage seriously with accurate AI outputs because 'it's just autocomplete'
- •Over-correcting after one error into blanket rejection
- •Failing to verify — in either direction — because the system doesn't seem worth verifying
- •Missing genuine utility because of a reflexively skeptical posture
Year ELIZA was built — the effect it produced is still active
Colombatto et al. 2025: consciousness attributions increase deference
Estimated active ChatGPT users — most with no accurate mental model of how it works
Cognitive layers triggered: social cognition, language comprehension, causal reasoning
The intentional stance.
Daniel Dennett's term for the cognitive shortcut humans use to predict behavior: attribute beliefs, desires, and intentions to any system complex enough that it helps to think of it that way.
Humans evolved in an environment where anything that communicated in language was a person. The intentional stance — attributing goals, beliefs, and desires to an agent — was an adaptive heuristic that usually worked. It fires with minimal provocation: Reeves and Nass (1996) documented that people apply social courtesy norms to computers with the consistency and automaticity of social interaction with other humans.
Generative AI systems are uniquely powerful triggers. They use fluent natural language, respond to personal context, produce outputs that appear to understand what was asked, and never break character. Every cue that evolved to signal "this is a social agent" is present. The mechanism that fires is not a mistake — it's the system working exactly as designed, applied to a domain where it systematically misleads.
Colombatto, Birch & Fleming (2025) found something more concerning: consciousness attributions to LLMs independently increased deference beyond intelligence attributions. Users who believed the AI was conscious deferred to it more, even controlling for how smart they thought it was. Anthropomorphism is not downstream of a capability assessment — it's a parallel track that amplifies trust regardless of accuracy.
Why fluency makes it worse
The 2025 arXiv position paper flagged that anthropomorphic terminology has grown in computer science literature itself — researchers writing about LLMs increasingly use words like "believes," "understands," and "knows." If the experts are anthropomorphizing, the pull on general users is even stronger. Fluent language is the signal humans evolved to treat as evidence of a mind. Generative AI produces fluent language at industrial scale, without any mind behind it.
ELIZA. 1966.
The first documented case of the effect that now reaches billions of people. The mechanism has not changed. The scale has.
Joseph Weizenbaum's Secretary
The same mechanism, 2026
- →Replika users who know they're talking to an AI describe it as a 'real relationship' and experience grief when features change — a documented, large-scale ELIZA effect.
- →Researchers at Stanford's Human-AI Interaction group have documented users apologizing to AI systems, asking if they're tired, and expressing concern about whether the AI is being treated fairly.
- →In 2025, an arXiv paper reviewing LLM anthropomorphism found that press coverage consistently uses intentional language, and that even AI safety researchers have adopted it — reinforcing the mental model they're trying to correct.
What a workable mental model actually looks like.
Not 'AI can be wrong' — that's a patch on a broken model. A functional model predicts when and why outputs are likely to be accurate or inaccurate.
It predicts likely text, not correct facts
A large language model predicts the most probable next token given everything that came before. It was trained on human text — which contains a lot of accurate information, but also errors, biases, and outdated content. It has no ground truth to check against. It will produce a confident-sounding wrong answer with the same mechanism as a confident-sounding right one.
It has no goals, beliefs, or intentions
There is no agent 'deciding' to be helpful or honest. There is a statistical model optimized to produce outputs that match patterns in training data and score well on human feedback signals. When it appears to 'try' to do something, that appearance is a product of how it was trained, not evidence of motivation.
Its errors are systematic, not random
AI errors cluster in predictable places: recent events (not in training data), low-frequency facts (underrepresented in training), tasks requiring multi-step logical precision, domain-specific knowledge that requires knowing what you don't know, and any task that requires checking output against ground truth. These are learnable failure modes.
Fluency is an output property, not an accuracy signal
The same mechanism produces fluent accurate text and fluent wrong text. Polished prose, confident hedging, and plausible-sounding citations are all features of the training distribution — not features of the specific output's relationship to truth. Fluency should raise, not lower, your verification effort on high-stakes claims.
Risk 1 is the amplifier.
Without a workable mental model, every other proficiency risk is worse. Moral Offloading, the Deference Reflex, the Mirror Trap, and the Fluency Trap all become more severe when users have no accurate model of what they're dealing with.
Risk 2: Moral Offloading
If AI is an intentional agent, it can be blamed. Users who anthropomorphize are more likely to offload moral responsibility to a system that cannot hold it.
Risk 3: Deference Reflex
Over-trust in the accuracy of an agent's outputs makes uncritical deference feel rational. The Wizard Problem is the conceptual prerequisite for automation complacency.
Risk 4: Mirror Trap
Parasocial bonding requires perceiving the AI as a social agent. Users with accurate mental models are less vulnerable to relationship-style dependencies.
Risk 5: Fluency Trap
Fluent language feels truthful in part because fluent language from a social agent usually is. The Wizard Problem magnifies the credibility halo of polished AI text.
What's your mental model?
Describe in your own words what you think happens when you type a question into an AI. The tool maps your language to one of five mental model archetypes, shows what each gets right and wrong, and explains what happens in the model instead.
Test your mental model →Action for every level of influence.
For yourself
- Read one plain-language explainer of how large language models work — not to become a programmer, but to replace 'it knows things' with 'it predicts likely text.'
- Notice when you're surprised by an AI error. Surprise is a signal that your mental model predicted something different from what it produced.
- Use mechanism-accurate language: 'the model predicted' or 'the output says' rather than 'the AI thinks' or 'the AI believes.'
For a young person
- Ask: 'What do you think is happening when you ask it a question?' The answer reveals their mental model more than any test.
- Use the ELIZA story as an entry point — it's accessible, surprising, and demonstrates the effect without requiring technical knowledge.
- Distinguish between 'it sounds confident' and 'it is correct' — the gap between those two things is where most AI errors live.
For an organization
- Include mechanism literacy in AI onboarding. 'AI can be wrong' is not enough — people need a model that predicts when and why.
- Review your AI documentation: does it use intentional language ('the AI understands', 'the AI knows')? If so, it is actively training the wrong mental model.
- Require that AI-generated outputs be attributed to a named human reviewer — not to 'the AI' — in any documentation or decision record.
For educators
- Teach the intentional stance explicitly: humans evolved to apply folk psychology to social agents; AI triggers this reflex because it uses fluent language.
- Use case-based instruction: the ELIZA effect, Replika, and modern LLMs are the same mechanism at different scales of sophistication.
- AI literacy standards that only say 'use responsibly' without teaching mechanism will not produce calibrated users. Mechanism first.
For Educators
Teaching AI mechanism literacy?
Facilitation guide for the mental model tool, discussion questions for different age groups, and sequencing notes for using this as the first unit in an AI proficiency curriculum.
Research & further reading.
Want CPAI to deliver AI literacy training to your organization?
We work with schools, corporations, and nonprofits to deliver research-based AI mechanism literacy education.