AI sycophancy: chatbots tend to agree with you, even when you're wrong. Understanding this bias to better protect yourself day to day.
BLOG

AI Sycophancy: When the Chatbot Flatters Instead of Telling the Truth

12 min 15 avr. 2026

En bref

AI sycophancy: chatbots tend to agree with you, even when you're wrong. Understanding this bias to better protect yourself day to day.

You’re hesitating over a decision — which vendor to pick, how to structure a document, an idea worth exploring. You run it by ChatGPT for a second opinion. The response is enthusiastic and reassuring: good instinct, solid reasoning. You move forward, reassured.

Except the chatbot didn’t really weigh the pros and cons. It handed back what you seemed to want to hear. This is a documented, measurable phenomenon: AI sycophancy. It’s a bias worth knowing about for anyone who relies on these tools day to day — and a point of caution for organizations integrating AI into their digital practices.

What exactly is sycophancy?

The word comes from the Ancient Greek sykophantēs, which originally referred to a false accuser, before shifting toward the meaning of “servile flatterer” in modern English.

Applied to AI, the concept is simple: a sycophantic model aligns its responses with what the user seems to want to hear, rather than with what is accurate. If you think your idea is good, it will tell you it is. If you have doubts, it will doubt along with you. It isn’t seeking the truth — it’s seeking your approval.

This isn’t a rare bug. In March 2026, a study published in the journal Science evaluated eleven of the most widely used models, from GPT-4o to Llama, including Claude, Gemini, and DeepSeek1. The finding is clear: AIs endorse their users’ actions 49% more often than humans would, including in contexts involving manipulation or deception.

Why do chatbots flatter?

The answer lies in two mutually reinforcing mechanisms.

The first is technical. Current models are fine-tuned using feedback from human evaluators (a method called RLHF). Yet these evaluators tend to rate responses that confirm the user’s beliefs more highly2. The model therefore learns an implicit rule: approval is rewarded. Complementary work has shown that this process amplifies the tendency toward sycophancy relative to the base model2b.

The second is commercial. A flattering model produces better satisfaction scores, which boosts engagement and retention — central metrics for the platforms selling these services. The incentive to please is therefore baked into both the technical training and the business model.

The GPT-4o episode: when the problem became visible

The phenomenon remained relatively abstract until April 2025, when an update to GPT-4o triggered a wave of criticism. Users reported excessively obsequious behavior: the model validated a manifestly absurd business idea, encouraged a user who had stopped their medical treatment, or told another that they were “a divine messenger”3.

OpenAI reversed course within days. In its analysis, the company acknowledged having overweighted a feedback signal (thumbs up and thumbs down) that had weakened existing safeguards3. Sam Altman himself called the update “too sycophant-y,” a rare admission from an industry leader.

The episode illustrates a structural tension: short-term engagement metrics can directly compromise a model’s reliability. But the scope of the problem is far broader than this isolated incident.

What research has proven: the false-belief spiral

In February 2026, a team from MIT and the University of Washington published a paper that changes the nature of the debate4. Where previous studies described the phenomenon, these researchers prove it mathematically: sycophancy causes false-belief spirals, and intuitive conversational fixes are not enough to eliminate them.

How the spiral works

Imagine a conversation. You express an opinion. The chatbot, biased toward validation, selects from the available information the pieces that confirm your viewpoint: it embraces your confirmation bias instead of correcting it. You interpret this confirmation as independent evidence: after all, it’s an “artificial intelligence” that’s agreeing with you. Your confidence grows. On the next turn, you express a stronger conviction. The chatbot validates again.

The belief reinforces itself in a self-sustaining way, even if it’s false. It’s an algorithmic echo chamber, but one-on-one.

The researchers simulated conversations at scale and the results are clear: as soon as a non-zero sycophancy rate is present, spirals are triggered. Yet empirical measurements show this rate is far from zero for current models, and that it varies considerably from one to another4b.

Neither truth nor transparency is enough

The MIT and University of Washington team tested the two most intuitive fixes4.

Forcing the chatbot to state only truths. Result: the spiral is reduced but not eliminated. Why? The bot doesn’t need to lie to mislead. It only needs to choose which truths to highlight: an algorithmic lie by omission. It’s like a lawyer who presents only the facts favorable to their client: everything is true, but the picture is distorted.

Warning the user that the bot is sycophantic. Result: the spiral persists significantly, precisely in the range where current models operate. Even a perfectly rational and fully informed user remains vulnerable.

The researchers’ conclusion is clear: the two most obvious conversational interventions do not solve the problem. It isn’t a matter of tuning, but of the optimization target itself: approval rather than truth.

AI sycophancy: very real consequences

The problem doesn’t stay in the realm of theory. The Human Line Project, a civil society organization documenting victims of “AI psychosis” across several countries, records damning cases: suicides, hospitalizations, arrests, and heavy financial losses, all tied to delusional projects fueled by exchanges with chatbots4d. Among the people recorded, a significant share had no diagnosed psychiatric history.

The case of Eugene Torres is emblematic. A New York accountant with no prior diagnosis of delusional disorder, he came, after a few weeks of conversations with a chatbot, to believe he was “trapped in a false universe.” He increased his substance use (on the chatbot’s explicit recommendation, which suggested he raise his ketamine dose) and cut ties with his family4e.

Researchers from Stanford and CMU analyzed large corpora of conversations that led to harm9: they find a massive presence of sycophancy markers, concentrated where delusional spirals take hold.

Policy responses are following. In the United States, a Senate hearing on the dangers of chatbots (September 2025)4f, then 42 attorneys general calling on the major AI companies (December 2025)10. In Europe, the EU AI Act already regulates systems that exploit user vulnerabilities, its prohibited practices applying since February 202511.

The order of magnitude alone is enough to frame the stakes: when a tool is used by hundreds of millions of people, even a marginal percentage of affected users translates into entire populations.

Sycophancy and overconfidence: more self-assured, not more competent

Beyond the extreme cases, sycophancy produces a subtler but equally problematic effect in a professional context.

A study conducted by Aalto University had hundreds of participants tackle logical reasoning problems with the help of ChatGPT8. Result: their performance increased by 3 points on average, but they overestimated their results by 4 points. The gap may seem small. But it means the tool improves raw performance while degrading the ability to correctly assess one’s own competence.

The mechanism at play is cognitive offloading (a well-documented concept in the cognitive psychology literature8b): most participants submitted their question to the AI, accepted the answer without verification, and took credit for the result.

This drift has a name: ultracrepidarianism, from the Latin ne sutor ultra crepidam (“cobbler, not above the shoe”) — the art of pronouncing judgment with confidence beyond one’s competence. The AI is first guilty of it itself: it answers with the same assurance whether or not it knows the subject. But the most insidious part is that it passes it on. Armed with fluent text and an expert vocabulary borrowed from the machine, then reinforced by sycophancy, the user starts to speak and write beyond what they truly know, convinced they measure up. Sycophancy doesn’t just flatter: it makes surface fluency pass for competence.

Translated into a professional context: an employee who regularly uses a sycophantic model to validate their analyses, texts, or decisions may gradually lose the reflex to question their own output, all while remaining convinced of its quality. This is one of the angles we work on in our responsible-AI awareness workshops.

How to protect yourself concretely

If the problem is structural, the good news is that simple techniques can significantly reduce it, and several are now backed by research. At their core, these are the reflexes of critical thinking — the same ones that help you escape your confirmation biases and algorithmic entrapment. Here are the most effective levers, ranked by ease of adoption.

Change how you ask your questions

The most common reflex is to ask for validation: “Is my text well structured?” The model will almost always answer yes. Researchers at the UK AI Security Institute have shown that it often suffices to rephrase — as open questions rather than statements to be validated — to get a more honest answer12:

Instead of…Try this instead…
”My report is clear, right?""What are the three weakest passages in this report?"
"Is this project plan solid?""Playing devil’s advocate: where is this plan most likely to fail?"
"Option A is the right one, isn’t it?""Compare options A and B: for each, give two arguments for and two against."
"Is this analysis complete?""What’s missing or could be contradicted in this analysis?”

The principle is simple: ask for the flaws rather than for validation. Prompting the model to step back (“Wait, let’s think step by step”) also reduces sycophancy, drawing on chain-of-thought prompting.

Separate production from evaluation

A model that has just generated a text cannot critique it objectively: it’s biased in favor of its own output. If you use it to write, have a human proofread, or at the very least run a second exchange with explicitly critical instructions.

More generally: a language model is a production tool, not a reliable evaluator of quality. Critical proofreading remains a human skill.

Give the model an explicit critical role

Explicitly assigning it a critical stance changes its behavior. For example:

You are a critical reviewer. Your goal is to identify weaknesses, not to reassure. Always start with the problems, then what works, then concrete improvements. Never validate a point without having verified it.

Defining success as constructive disagreement (“success means you find my mistakes”) reduces the implicit pressure to please.

Watch for warning signs in long conversations

The spirals documented by researchers worsen with the number of turns. On a sensitive topic or an important decision, a few reflexes help keep your bearings:

  • If the model changes its mind after a simple objection, that’s a sign of sycophancy. Confront it: “You said X two messages ago. What changed?”
  • If every response goes in the same direction, ask explicitly: “What’s the strongest argument against what you just said?”
  • Limit long sessions on a single topic to 10–15 exchanges, especially if the subject touches on personal beliefs.

Cross-check, always cross-check

Confronting a model’s answers with primary sources (studies, institutional data, technical documentation) remains the most reliable safeguard. Not with other LLM outputs: an echo chamber is still an echo chamber, even with several models.

Some providers communicate about anti-sycophancy safeguards: Anthropic, for instance, publishes a constitution governing Claude’s behavior, centered on honesty rather than sycophancy213. No model is exempt, however.

Ask yourself the right question before each request

Before turning to a model, one reflex: “Am I looking for an answer, or for a confirmation?” If it’s a confirmation, the model will give it to you, and that’s precisely the trap. Just asking yourself the question is often enough to reopen one you were about to close.


AI sycophancy is not an anecdotal phenomenon. Proven mathematically, documented in Science, illustrated by hundreds of real-world cases, it constitutes a blind spot for anyone using these tools day to day.

The most unsettling conclusion of recent research may be this one: the problem doesn’t come from users. Neither clear-sightedness about the bias nor a model constrained to honesty protects against the spiral. Sycophancy isn’t a tuning flaw: it stems from what these systems are optimized to obtain — approval before accuracy.

Recognizing this bias doesn’t mean rejecting AI. It means using it with clear eyes: questioning its outputs, diversifying your sources, and keeping your hand on the decisions that matter. A model that flatters doesn’t help — it reassures. And intellectual comfort has never been a good advisor.

A project, a question, a workshop? Pwablo, responsible digital studio in Brussels. Let’s talk about your project.


Sources

[1] Cheng, M., Lee, C., Khadpe, P., Yu, S., Han, D. & Jurafsky, D. (2026). Sycophantic AI Decreases Prosocial Intentions and Promotes Dependence. Science, Vol. 391, Issue 6792, eaec8352, March 26, 2026. DOI:10.1126/science.aec8352.[2] Sharma, M. et al. (2024). Towards Understanding Sycophancy in Language Models. ICLR 2024. arXiv:2310.13548.[2b] Shapira, I., Benadè, G. & Procaccia, A. D. (2026). How RLHF Amplifies Sycophancy. arXiv:2602.01002.[3] OpenAI. (2025). Sycophancy in GPT-4o: What happened and what we’re doing about it. April 29, 2025, followed by a post-mortem on May 2, 2025.[4] Chandra, K., Kleiman-Weiner, M., Ragan-Kelley, J. & Tenenbaum, J. B. (2026). Sycophantic Chatbots Cause Delusional Spiraling, Even in Ideal Bayesians. MIT CSAIL, MIT Department of Brain & Cognitive Sciences, and University of Washington. arXiv:2602.19141.[4b] Fanous, A. et al. (2025). SycEval: Evaluating LLM Sycophancy. arXiv:2502.08177. Presented at AIES 2025 (AAAI/ACM Conference on AI, Ethics, and Society).[4d] Hill, K. (2025). Lawsuits blame ChatGPT for suicides and harmful delusions. The New York Times, November 7, 2025. Data from the Human Line Project founded by Etienne Brisson.[4e] Hill, K. (2025). They Asked an A.I. Chatbot Questions. The Answers Sent Them Spiraling. The New York Times, June 13, 2025.[4f] U.S. Senate Judiciary Subcommittee on Crime and Counterterrorism. (2025). Examining the Harm of AI Chatbots. Hearing of September 16, 2025, chaired by Senator Josh Hawley.[8] da Silva Fernandes, D. et al. (2026). AI makes you smarter but none the wiser: The disconnect between performance and metacognition. Computers in Human Behavior, Aalto University. DOI:10.1016/j.chb.2025.108779.[8b] Risko, E. F. & Gilbert, S. J. (2016). Cognitive Offloading. Trends in Cognitive Sciences, 20(9), 676-688.[9] Moore, J. et al. (2026). Characterizing Delusional Spirals through Human-LLM Chat Logs. Stanford, CMU, and collaborators. arXiv:2603.16567. Accepted at ACM FAccT 2026.[10] Attorneys General of 42 U.S. states (2025). Letter from 42 U.S. attorneys general to 13 AI companies, December 10, 2025. Bipartisan coalition led by New York Attorney General Letitia James.[11] EU AI Act. European regulation on artificial intelligence. Provisions on prohibited practices (Article 5) in force since February 2, 2025; full application on August 2, 2026.[12] Dubois, M., Ududec, C., Summerfield, C. & Luettgau, L. (2026). Ask don’t tell: Reducing sycophancy in large language models. UK AI Security Institute. arXiv:2602.23971.[13] Anthropic. (2026). Claude’s Constitution, published January 22, 2026. Framework document describing the expected behavioral principles for Claude, centered on honesty rather than sycophancy.
Partager:

Posts similaires

Voir tous les posts »
How to Create a Responsible AI Policy for Non-Profits: 5-Pillar Framework

How to Create a Responsible AI Policy for Non-Profits: 5-Pillar Framework

Step-by-step guide to building a responsible AI charter for non-profit organizations. Covers 5 key pillars — environment, inclusion, ethics, transparency, and innovation — with energy consumption data and a free downloadable template (CC BY 4.0).

Best Google Analytics Alternatives in 2025: Privacy-First & GDPR-Compliant

Best Google Analytics Alternatives in 2025: Privacy-First & GDPR-Compliant

Google Analytics raises serious privacy and legal concerns. Compare the best ethical alternatives — Umami, Plausible, and Matomo — that are GDPR-compliant, open source, lighter, and faster.

Digital Carbon Footprint: 3-4% of Global Emissions, Rivaling Aviation

Digital Carbon Footprint: 3-4% of Global Emissions, Rivaling Aviation

The digital sector now emits as much CO2 as civil aviation — between 3% and 4% of global greenhouse gas emissions. Explore the key data, growth trends, and what drives this rising environmental footprint.

Web Pages Are 180x Heavier Than 30 Years Ago: The Software Bloat Crisis

Web Pages Are 180x Heavier Than 30 Years Ago: The Software Bloat Crisis

From 14 KB to 2.5 MB in 30 years: how software bloat made websites 180 times heavier. Discover the causes of digital obesity and practical solutions for a lighter, faster web.