Fighting Fake Content With Fact-Based Generation

A few years ago I was on a team that built a Chrome extension during a hackathon aimed at fighting Holocaust denial and antisemitism online. We called it Savee. We won first place on a demo trained on a single Wikipedia page — and then spent the following weeks learning why that demo was the easy part.

The premise is worth stating plainly, because most “misinformation detection” tools don’t work this way: they classify. A post goes in, a label comes out — true, false, misleading, 62% confidence. The trouble is that a label persuades almost nobody. We weren’t trying to change the mind of the person who posted the lie. We were trying to give the audience reading the replies something better than a red banner: a fact-based answer, in the thread, where the lie lives.

That instinct turns out to be backed by the debunking literature. The old “backfire effect” folk wisdom — that correcting someone entrenches them — mostly failed to replicate; large studies found belief backfires are rare, and corrections generally do nudge people toward the facts. But the same field is clear that how you correct matters enormously. Chan, Albarracín, Jamieson and Jones, in a meta-analysis of debunking, put it directly:

A detailed counter-message is better at persuading people to change their minds than merely labeling misinformation as wrong.

Bare labels are the weakest tool in the box. A detailed, reasoned rebuttal is the strong one. Savee was a bet that you could generate that rebuttal on demand.

The exact mechanism

The build is a textbook case of what the field now calls RAG — retrieval-augmented generation, the pattern Lewis et al. formalized in 2020: pair a generative model with a non-parametric memory it must retrieve from, so the output is anchored to fetched documents rather than the model’s own weights. We were doing this in early 2023, before “RAG” was a household acronym, but that’s exactly the shape.

Savee v0.1 architecture: facts managers upload historical documents which are compressed into a fact list, chunked, embedded and stored; a selected social post is embedded, matched by vector similarity to top-k facts, composed into a prompt, and answered by the bot.

The pipeline, concretely:

Facts managers — historians and volunteers — upload unstructured, multilingual historical documents. Each blob is compressed into a list of discrete facts, chunked, embedded, and stored with its vector.
Query: an ambassador selects a post in the extension. Its text is embedded and matched against the fact store by nearest-vector similarity.
Generation: the top-k facts are composed into a prompt whose instruction is the whole point of the project:

Answer ONLY using the facts below. Do not use any outside knowledge.
If the facts do not cover the claim, say so — do not guess.

Facts:
{{top_k_retrieved_facts}}

Post to respond to:
{{selected_post_text}}

That last constraint is where I want to be honest about what it buys you — and what it doesn’t.

Where grounding stops helping

Grounding a generation in retrieved facts does not make it true. It makes it traceable. The model can still misread a fact, blend two chunks into a claim neither one supports, or answer confidently where the retrieved facts only partly cover the question. “Only use these documents” shrinks the surface area for invention; it doesn’t eliminate it. You’ve swapped “the model can say anything” for “the model can misuse a smaller, curated set of things.” People who hear “grounded” tend to hear “guaranteed,” and it isn’t.

The harder problem was never the model — it was the source material. Our first version looked great because Wikipedia is already curated, deduplicated, roughly neutral. The moment we scaled to primary testimony, archives, multilingual sources of varying reliability, answer quality became entirely a function of who curated the fact base and how carefully. A fact-grounded bot is only as good as its facts manager. We hadn’t built an “AI that knows history.” We’d built a fast, confident interface over whatever a small volunteer team had time to digitize and vet that month. No amount of prompt engineering moves that bottleneck.

And there’s an arms-race shape that’s easy to miss heads-down. LLM-generated fake content is cheap and getting cheaper. An LLM-generated grounded rebuttal is also cheap — which is exactly the temptation. But cheap-on-both-sides doesn’t net to a wash: volume wins unless the grounding step has teeth. A thin or stale fact base just automates a second stream of confident-sounding text into a feed that already has too much of that.

We built it because the diagnosis felt right: a classifier yelling “false” moves few people, and volunteers hand-replying to hate don’t scale against generative content about to flood in from every direction. Meeting generated content with generated, sourced counter-content is a genuinely better shape for the problem than binary flagging. I just won’t oversell “sourced” once an LLM is doing the writing — it’s a constraint on the model, not a guarantee on the output, and it’s only as strong as the archive behind it.

Code’s open source, contributions and skepticism both welcome: github.com/feedox/savee.

Here’s the question I still can’t answer cleanly: if a grounded rebuttal is only as trustworthy as its fact base, who is accountable for curating that base at scale — platforms, historians, an open commons — and how would you keep it honest? Where do you think that responsibility should sit?

References

Patrick Lewis et al. — Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks, NeurIPS 2020: arxiv.org/abs/2005.11401
Chan, Albarracín, Jamieson & Jones — Counterarguments Are Critical to Debunking Misinformation: psychologicalscience.org
Poynter — Fact-checking doesn’t ‘backfire,’ new study suggests: poynter.org
Savee — source code: github.com/feedox/savee </content> </invoke>