Literacy BandRisk 1 of 6

The Wizard Problem

Literacy: Foundations

In 1966, Joseph Weizenbaum created ELIZA — a simple pattern-matching program that mimicked a therapist. His secretary, who knew it was a program, asked him to leave the room so she could speak to it privately. The mechanism hasn't changed. The programs have gotten much better.

Why humans apply social cognition to AI, what that does to trust and deference, and what a workable mental model of how generative AI actually works looks like.

1966

The number behind this guide

The ELIZA effect is nearly sixty years old — and still running.

Weizenbaum's secretary knew ELIZA was a program. She still asked him to leave the room so she could talk to it privately.

Check your mental model →

The Problem

Two wrong models, neither one workable.

Users oscillate between treating AI as a wizard — infallible oracle, intentional agent — and treating it as a parlor trick — shallow autocomplete, not worth engaging seriously. Both failures produce systematic errors.

The Wizard error (over-trust)

•Treating confident-sounding outputs as factually authoritative
•Assuming AI 'knows what you mean' and produces intended outputs
•Deferring to AI on decisions requiring clinical, legal, or domain expertise
•Attributing goals, intentions, or understanding to the system

The Dismissal error (under-trust)

•Refusing to engage seriously with accurate AI outputs because 'it's just autocomplete'
•Over-correcting after one error into blanket rejection
•Failing to verify — in either direction — because the system doesn't seem worth verifying
•Missing genuine utility because of a reflexively skeptical posture

Year ELIZA was built — the effect it produced is still active

Colombatto et al. 2025: consciousness attributions increase deference

Estimated active ChatGPT users — most with no accurate mental model of how it works

Cognitive layers triggered: social cognition, language comprehension, causal reasoning

The Mechanism

The intentional stance.

Daniel Dennett's term for the cognitive shortcut humans use to predict behavior: attribute beliefs, desires, and intentions to any system complex enough that it helps to think of it that way.

Humans evolved in an environment where anything that communicated in language was a person. The intentional stance — attributing goals, beliefs, and desires to an agent — was an adaptive heuristic that usually worked. It fires with minimal provocation: Reeves and Nass (1996) documented that people apply social courtesy norms to computers with the consistency and automaticity of social interaction with other humans.

Generative AI systems are uniquely powerful triggers. They use fluent natural language, respond to personal context, produce outputs that appear to understand what was asked, and never break character. Every cue that evolved to signal "this is a social agent" is present. The mechanism that fires is not a mistake — it's the system working exactly as designed, applied to a domain where it systematically misleads.

Colombatto, Birch & Fleming (2025) found something more concerning: consciousness attributions to LLMs independently increased deference beyond intelligence attributions. Users who believed the AI was conscious deferred to it more, even controlling for how smart they thought it was. Anthropomorphism is not downstream of a capability assessment — it's a parallel track that amplifies trust regardless of accuracy.

Why fluency makes it worse

The 2025 arXiv position paper flagged that anthropomorphic terminology has grown in computer science literature itself — researchers writing about LLMs increasingly use words like "believes," "understands," and "knows." If the experts are anthropomorphizing, the pull on general users is even stronger. Fluent language is the signal humans evolved to treat as evidence of a mind. Generative AI produces fluent language at industrial scale, without any mind behind it.

The Case

ELIZA. 1966.

The first documented case of the effect that now reaches billions of people. The mechanism has not changed. The scale has.

Joseph Weizenbaum's Secretary

Outcome: Weizenbaum was disturbed enough to write 'Computer Power and Human Reason' (1976), arguing that certain decisions should never be delegated to machines. The program that prompted the book had no understanding of anything it processed. Today's LLMs are incomparably more sophisticated — and the effect Weizenbaum documented is correspondingly more powerful.

The same mechanism, 2026

→Replika users who know they're talking to an AI describe it as a 'real relationship' and experience grief when features change — a documented, large-scale ELIZA effect.
→Researchers at Stanford's Human-AI Interaction group have documented users apologizing to AI systems, asking if they're tired, and expressing concern about whether the AI is being treated fairly.
→In 2025, an arXiv paper reviewing LLM anthropomorphism found that press coverage consistently uses intentional language, and that even AI safety researchers have adopted it — reinforcing the mental model they're trying to correct.

The Correction

What a workable mental model actually looks like.

Not 'AI can be wrong' — that's a patch on a broken model. A functional model predicts when and why outputs are likely to be accurate or inaccurate.

It predicts likely text, not correct facts

A large language model predicts the most probable next token given everything that came before. It was trained on human text — which contains a lot of accurate information, but also errors, biases, and outdated content. It has no ground truth to check against. It will produce a confident-sounding wrong answer with the same mechanism as a confident-sounding right one.

It has no goals, beliefs, or intentions

There is no agent 'deciding' to be helpful or honest. There is a statistical model optimized to produce outputs that match patterns in training data and score well on human feedback signals. When it appears to 'try' to do something, that appearance is a product of how it was trained, not evidence of motivation.

Its errors are systematic, not random

AI errors cluster in predictable places: recent events (not in training data), low-frequency facts (underrepresented in training), tasks requiring multi-step logical precision, domain-specific knowledge that requires knowing what you don't know, and any task that requires checking output against ground truth. These are learnable failure modes.

Fluency is an output property, not an accuracy signal

The same mechanism produces fluent accurate text and fluent wrong text. Polished prose, confident hedging, and plausible-sounding citations are all features of the training distribution — not features of the specific output's relationship to truth. Fluency should raise, not lower, your verification effort on high-stakes claims.

Why It Matters

Risk 1 is the amplifier.

Without a workable mental model, every other proficiency risk is worse. Moral Offloading, the Deference Reflex, the Mirror Trap, and the Fluency Trap all become more severe when users have no accurate model of what they're dealing with.

Risk 2: Moral Offloading

If AI is an intentional agent, it can be blamed. Users who anthropomorphize are more likely to offload moral responsibility to a system that cannot hold it.

Risk 3: Deference Reflex

Over-trust in the accuracy of an agent's outputs makes uncritical deference feel rational. The Wizard Problem is the conceptual prerequisite for automation complacency.

Risk 4: Mirror Trap

Parasocial bonding requires perceiving the AI as a social agent. Users with accurate mental models are less vulnerable to relationship-style dependencies.

Risk 5: Fluency Trap

Fluent language feels truthful in part because fluent language from a social agent usually is. The Wizard Problem magnifies the credibility halo of polished AI text.

Interactive Tool

What's your mental model?

Describe in your own words what you think happens when you type a question into an AI. The tool maps your language to one of five mental model archetypes, shows what each gets right and wrong, and explains what happens in the model instead.

Test your mental model →

What You Can Do

Action for every level of influence.

For yourself

Read one plain-language explainer of how large language models work — not to become a programmer, but to replace 'it knows things' with 'it predicts likely text.'
Notice when you're surprised by an AI error. Surprise is a signal that your mental model predicted something different from what it produced.
Use mechanism-accurate language: 'the model predicted' or 'the output says' rather than 'the AI thinks' or 'the AI believes.'

For a young person

Ask: 'What do you think is happening when you ask it a question?' The answer reveals their mental model more than any test.
Use the ELIZA story as an entry point — it's accessible, surprising, and demonstrates the effect without requiring technical knowledge.
Distinguish between 'it sounds confident' and 'it is correct' — the gap between those two things is where most AI errors live.

For an organization

Include mechanism literacy in AI onboarding. 'AI can be wrong' is not enough — people need a model that predicts when and why.
Review your AI documentation: does it use intentional language ('the AI understands', 'the AI knows')? If so, it is actively training the wrong mental model.
Require that AI-generated outputs be attributed to a named human reviewer — not to 'the AI' — in any documentation or decision record.

For educators

Teach the intentional stance explicitly: humans evolved to apply folk psychology to social agents; AI triggers this reflex because it uses fluent language.
Use case-based instruction: the ELIZA effect, Replika, and modern LLMs are the same mechanism at different scales of sophistication.
AI literacy standards that only say 'use responsibly' without teaching mechanism will not produce calibrated users. Mechanism first.

For Educators

Teaching AI mechanism literacy?

Facilitation guide for the mental model tool, discussion questions for different age groups, and sequencing notes for using this as the first unit in an AI proficiency curriculum.

Educator Guide →

Sources

Research & further reading.

Weizenbaum, J. — ELIZA (1966)Original paper describing the ELIZA program and the unexpected human responses it produced. Communications of the ACM, 9(1).Reeves & Nass — The Media Equation (1996)The foundational book documenting how humans apply social rules to computers: politeness, reciprocity, attribution of personality.Colombatto, Birch & Fleming (2025) — Communications PsychologyPreregistered study (N=410): consciousness attributions to LLMs independently increase deference beyond intelligence attributions alone.Dennett, D. — The Intentional Stance (1987)Philosophical framework explaining why humans attribute beliefs, desires, and intentions to systems — including thermostats and chess programs.

Last reviewed: May 2026We review this page quarterly. Statistics in this category change rapidly.Evidence quality for this domain is high — the ELIZA effect and intentional stance research span 60 years. Colombatto et al. 2025 is the most recent specific study; methodology is preregistered.

← AI Proficiency Framework Next: Risk 2 — Moral Offloading →

Want CPAI to deliver AI literacy training to your organization?

We work with schools, corporations, and nonprofits to deliver research-based AI mechanism literacy education.

Partner With Us AI Proficiency Framework →