10 Habits That Keep Your AI Honest

This happens to me most weeks. An AI hands me an answer that looks right, reads well, and is wrong somewhere in the middle. Building AI is what we do at Serpin, where quality is the job, so catching these before they reach anyone is part of mine. The most dangerous AI mistakes are the well-written ones: the wrong answer with confident detail, clean structure, and three citations sails straight into your report.

There's plenty of advice on how to fix this, and some of it is good. But when I went through the actual research, I found some of the most repeated advice is weaker than it sounds, one popular technique can make answers worse, and a few of the best-evidenced habits barely get mentioned. So this is my version, ranked by what the evidence supports. Everything works in the normal chat window. No setup needed.

The toolkit · strongest evidence first

The 10 habits

Skim the lot, then tap any one for the prompt and the evidence behind it.

1Fence it to your sourcesUpload the document and tell it to answer only from what's inside.→ 2Make it search the webNo document? Make it look things up instead of recalling them.→ 3Quote first, then analysePull the exact quotes before any conclusions, so claims trace to the text.→ 4Name a trusted source"According to [a named, trusted source]..." steers it to something checkable.→ 5Make it calculate, don't guessSend numbers to the code tool so it computes rather than predicts.→ 6Give it an outTell it a gap beats a guess, and to flag anything it's unsure of.→ 7Check the question, not just the answerHave it challenge false assumptions hidden in your prompt.→ 8Ask again, separatelyRe-ask in a fresh chat. Real knowledge stays put, fabrications drift.→ 9Ask for labelled claimsMake it mark each claim documented or inferred, so you know what to check.→ 10Click the citationsA citation being present doesn't make it right; a checked one does.→

Book a call

Want help getting this right?

Thirty minutes with me to talk through where you are leaning on AI, and where it might be quietly getting things wrong.

We look at what you use it for, and which of these habits would make the biggest difference.

Why this happens, and what it means for you

When AI "hallucinates" it sounds completely sure of itself while being wrong. It happens two ways: it misreads or muddles material you gave it, or it invents from nothing when its information is thin.

Two real mistakes. When it has your material: its summary of two contracts (30 and 90 day notice) reads 'a notice period of 60 days', a figure neither contract gave, because it blended the two. When it has no source: its answer on case law cites 'Henderson v. Realtech (2019)', a case that does not exist, produced because it had nothing to draw on.

It is built into how these models work: they predict plausible text, and guessing is rewarded in training [1]. Which leads to the one split that matters most: working from sources beats working from memory. The bars below show how often each invents content, where lower is better.

How often AI invents content, compared. Grounded and reliable, from material you give it: around 3 percent of cases, and improving [4]. From memory and unreliable, recalling facts on its own: around 33 percent, with OpenAI's o3 inventing answers on a third of person-facts questions [3].

Adopting the 10 habits is mostly about pushing your work to that grounded side. One last thing, because it powers several of them: fabrications are unstable. Ask for an invented reference twice and the details drift, where real knowledge stays put [2].

The toolkit

The 10 habits, in detail

I've ordered these by how well the research supports them, which changes the usual running order. The habit most guides lead with sits well down the list here, because several better-evidenced habits come first.

Habit 1

1. Give it the source material, and fence it in

The strongest evidenced move of all. Where you can, upload the document, paste the text, or work in a tool which uses your sources, then tell the model to stay inside them.

Fence it to the sources Paste into Claude, ChatGPT, Gemini or Copilot

Prompt

Using ONLY the information in the attached documents, answer the following. If the documents don't cover a point, say "not covered in the sources" rather than filling the gap from general knowledge.

Why it works

It converts the unreliable task, recalling facts from memory, into the reliable one, reading what's in front of it. This comes straight from Anthropic's official guidance [5] on reducing hallucinations, and the leaderboard data above shows grounded work is where models have genuinely improved.

One caveat to set expectations: with a very long document the model can still skim, or miss a detail buried in the middle. For points you will lean on, pair this with the quote audit (habit 3), which forces every claim back to a passage.

Habit 2

2. When you have no document, make it search the web

The next best grounding move is to make the model look something up rather than recall it. Tools differ in whether they search the live web by default, some always do, some need switching on, and that gap is where people get caught out. Look for a web or search option near the message box and turn it on; a globe or source links in the reply confirm it ran.

So for anything factual, time-sensitive, or more recent than the model was trained on, treat what comes back with habit 10 and click the sources. (Labels differ between tools and shift over time, so go by the function, not the exact button.)

Tell it today's date too. A model often doesn't reliably know it, so a line at the top stops it treating stale information as current. The prompt below includes a placeholder for it, swap in today's date.

Make it search, with links Paste into Claude, ChatGPT, Gemini or Copilot

Prompt

Today is [date]. Search the web for this and answer from what you find, with links. If you can't find a reliable source, tell me rather than answering from memory.

Why it works

Giving the model a live source to read turns recall back into reading, the same trick as habit 1 without an upload. Retrieval grounding measurably lowers invented content (the original research line, Lewis et al., 2020 [13]), though it does not remove it, which is why the citation check still matters. It also closes the recency trap: a model will answer confidently about events after its training cut-off as if they were settled fact (time-sensitive question studies [14] show accuracy falling sharply on facts that change over time), and making sure it actually searches is how you stop that.

Habit 3

3. Quote first, then analyse

For contracts, reports, research papers, anything where the detail matters, split the work in two.

Step 1, extract the evidence:

Step 1: extract the evidence Paste into Claude, ChatGPT, Gemini or Copilot

Prompt

Review this document. Extract the exact quotes relevant to [your question], copied word for word. If you can't find relevant quotes, say "no relevant quotes found".

Step 2, analyse only the evidence:

Step 2: analyse only the evidence Paste into Claude, ChatGPT, Gemini or Copilot

Prompt

Using only the quotes you extracted, answer my question. Reference the supporting quote for each claim. If a claim has no supporting quote, remove it and mark where it was removed with empty [] brackets.

Why it works

The model can't drift from the document when every claim has to point at a passage. The empty brackets are the clever bit, also from Anthropic's guidance: you see at a glance how much of the analysis survived contact with the evidence.

Habit 4

4. Point it at a named source when you have no document

When there's nothing to upload, naming a trusted source in the prompt still helps. Researchers at Johns Hopkins tested this [6] and found phrasing like "According to Wikipedia..." measurably increased how much of the answer came from the real source, and often improved accuracy too.

Name the source Paste into Claude, ChatGPT, Gemini or Copilot

Prompt

According to [a trusted source you name], what is [your question]?

Why it works

It steers the model towards reproducing what a named, checkable source says instead of free-associating from everything it has ever read. And it gives you the obvious follow-up: go and look at the source.

Habit 5

5. For numbers, make it calculate, not guess

A language model produces digits the same way it produces words, by predicting what looks right. That is why it can hand you a clean, confident total that is simply wrong. The fix is to make it run real code on the figures instead of doing the sum in its head.

Most assistants can run code on an uploaded file rather than working the numbers out in their head, writing and running the calculation instead of predicting the answer.

Run the numbers in code Paste into Claude, ChatGPT, Gemini or Copilot

Prompt

Use your data-analysis (code) tool to work this out from the file. Show the figures it produces, don't estimate them.

Why it works

The model is good at deciding what to compute and unreliable at doing the arithmetic, so this splits the two jobs. Delegating the calculation to executed code rather than predicted text is a well-established fix (program-aided language models [16] cut maths errors substantially this way). For anything where the number matters, never let it estimate.

Habit 6

6. Give it an out, with a bar attached

The most widely shared piece of advice is to tell the model it's allowed to say "I don't know". The research direction supports it, several studies show getting the model to hold back when it's unsure reduces wrong answers [8]. What I couldn't find is any support for the claim you sometimes see that this is the single most effective fix. It isn't. The habits above have stronger evidence. But it's still worth doing, and the sharper version follows directly from the OpenAI paper's logic about guessing:

Set the bar for an answer Paste into Claude, ChatGPT, Gemini or Copilot

Prompt

Only state things you're confident about. A wrong answer is worse to me than a gap. For anything you're unsure of, say so, and tell me what I'd need to check to confirm it.

Why it works

You've redefined what a good answer looks like, so the model no longer needs to guess to seem useful. The "what I'd need to check" part earns its place, turning each gap into a next step rather than a dead end.

Habit 7

7. Check the question, not only the answer

Models hallucinate hardest when the question itself contains a wrong assumption, because they tend to answer as if it were true. Ask about "the 2024 merger between X and Y" and you may get a detailed account of a merger which never happened. There's a growing research line [7] on exactly this failure. The same agreeableness shows up more broadly: models drift towards whatever answer you signal you're hoping for, a tendency researchers call sycophancy (Sharma et al., 2024 [15] found models favour a user's stated view over the truth). So keep the question neutral and police its assumptions, the fix is one line:

Police the question first Paste into Claude, ChatGPT, Gemini or Copilot

Prompt

Before answering, check my question for assumptions which might be false or unverifiable. If you find one, tell me instead of answering as if it were true.

Why it works

Every other habit polices the answer. This one polices the input, and a surprising share of confident nonsense starts with a flawed premise the model was too agreeable to challenge.

Habit 8

8. Ask again, separately, and ideally somewhere else

For facts you'll genuinely depend on, open a fresh conversation and ask the same question again. Compare the answers. The cheapest version needs no new chat at all: ask the same question two or three times, reworded, and watch for drift. It's the manual form of the sampling method the detection research uses.

When a conversation has run long, or you've moved on to a new topic inside it, start a fresh chat. Accuracy slips as a thread fills up with earlier context that no longer applies, so a clean window is often more reliable than a crowded one.

Different model families have different blind spots, so agreement between them means more than agreement with itself. Stronger still, then, ask a different model: paste the answer into another family and ask it to fact-check.

Cross-model fact-check Paste into a different model family

Prompt

Another AI gave me this answer. Fact-check it: flag any claim you believe is wrong or unverifiable, and say why.

Why it works

This is the instability fingerprint from earlier, turned into a habit. Real knowledge stays put between attempts, fabrications drift. It's also how the serious detection methods [10] published in Nature work underneath, by sampling several answers and checking whether they agree.

One correction to common advice

Asking the model to "double-check your answer" in the same conversation is not the same thing, and the research says it's unreliable. A DeepMind-affiliated study [11] found models often fail to spot their own mistakes this way, and sometimes talk themselves out of correct answers. Fresh conversation, different model, or quote audit. Not "are you sure?".

A few other popular moves belong in the same bin. Asking the model to argue why it might be wrong, or to rate its certainty out of ten, both lean on the same memory that produced the error, so neither is a real check. And one habit everyone recommends cuts both ways: telling the model to reason step by step can lower how often it invents things, but research finds it also hides the tell-tale signs [17], so the mistakes that do slip through are harder to catch. None of these replace a check against something outside the model: a fresh chat, a different model, or the source itself.

Habit 9

9. Ask for labelled claims, and treat the labels as triage

When you're going to act on what the AI gives you, make it show its hand. Ask it to tag each claim as something it can source or something it's inferring, so the shaky ones are flagged before you rely on them. For anything you'll act on:

Label every claim Paste into Claude, ChatGPT, Gemini or Copilot

Prompt

For each significant claim, add a label: [Documented, I can name the source] or [Inferred, plausible but unverified]. For statistics and citations especially, never present an inferred one as documented.

Why it works

Uncertainty can't hide inside confident prose when each claim carries its own flag, and the flags tell you what to check first. One caveat the research is firm on: don't ask for confidence percentages. Models inflate confidence in their own answers [9], up to 26% higher for responses they think they wrote, so a self-reported "92%" tells you more about ownership than accuracy. Coarse labels for triage, your own checking for truth.

Habit 10

10. Click the citations, even in search mode

Web search and deep research modes feel safe because the answer arrives with sources attached. The Tow Center at Columbia [12] tested eight AI search tools and found they gave incorrect answers on more than 60% of news queries, citing the wrong page, the wrong publisher, or links which didn't exist at all.

A citation being present doesn't make a claim right; a citation you've checked does. For any fact you'll reuse, click the two or three sources carrying the most weight and confirm the page says what the AI says it does. And when a claim arrives with no citation at all, treat that as the cue to open a source yourself before you reuse it, not as a sign it's common knowledge.

When a claim matters and no source is attached, force one with the prompt below, then stress-test it: ask again in a fresh chat and see if it holds. A real source stays put; a fabricated one drifts, or won't resolve to a stable locator at all.

Force a checkable source Paste into Claude, ChatGPT, Gemini or Copilot

Prompt

Give me the exact source for that: the page number, section or clause, and a working link. If you can't point to a specific, checkable location, say so rather than inventing one.

Why it works

Treat this as a detector, not a cure. A confident model can still hand you a confident fake, but instability across asks is one of the clearest tells you have.

Match the effort to the stakes

10 habits is far too many for everyday use, and most messages need none of them. The discipline is knowing which tier you're working in.

Low stakes

Use nothing

WhenBrainstorming, drafts you'll rework, exploring an idea.

Effort

No habits needed

HowSpeed matters more than precision, and a wrong detail costs you nothing.

Medium stakes

The prompt-level habits

WhenResearch you'll rely on, learning a new area, internal documents.

Effort

Habits 1269

HowSpot-check whatever the answer leans on most.

High stakes

The full set

WhenClient work, anything published, decisions with real consequences.

Effort

Habits 135810

HowThe AI drafts, you verify. An unverified claim isn't usable yet.

The effort should match the cost of being wrong. That sentence does more work than any individual habit.

Set it up once

All four tools let you save standing instructions, or a running memory, that apply to every conversation. The feature goes by different names, look for Custom instructions, Personalization, Personal context, Memory or Profile, but it does the same job in each.

Paste this into whichever you use, and several of the habits above become your permanent default. That one block bakes in habits 2, 6, 7 and 9, plus the recency habit, without you thinking about them again:

The set-it-once block Save into custom instructions or memory

Standing instructions

Accuracy matters more to me than completeness. Only state things you're confident about, and flag anything uncertain or likely to have changed since your training cut-off. A wrong answer is worse than a gap. When you give statistics, citations, or named facts, label whether each is documented or inferred, and never present an inferred one as documented. If my question contains an assumption you can't verify, point it out rather than answering as if it were true. For anything recent or time-sensitive, search the web rather than answering from memory.

What to take from this

Hallucinations come from how these models are trained and measured, and the incentive problem behind them is the industry's to fix, not yours. What you control is the working pattern around the tool. Ground it in your sources where you can, make it search rather than recall when the facts are live, give it an out with a bar attached, check the question as well as the answer, and save the heavy verification for work where being wrong costs something.

The models are getting better at grounded work and they remain unreliable narrators of their own memory. Used with that distinction in mind, they're enormously useful. Used without it, one day they'll hand you a confident, well-written mistake at the worst possible moment. The work is still ours to stand behind, whatever helped us produce it.

Building AI into your business?

Everything above is for the everyday chat window. If you're building AI agents or systems, where a fabrication can act on its own, the same problem has to be designed out at the architecture level. That's Serpin's deep dive.

Scott's companion page: 7 patterns to stop your AI making things up. How Serpin designs AI agent systems to minimise fabrication. Seven patterns across four layers, about a fifteen minute read. — Seven patterns that design fabrication out of AI agent systems.

Book a call

This is how I think about getting reliable work out of AI, and it's the same discipline we bring when we build production AI for clients. If that's a conversation worth having, grab a slot.

Sources

▸See all 17 sources

Kalai, Nachum, Vempala, Zhang, "Why Language Models Hallucinate", Sept 2025: https://arxiv.org/abs/2509.04664
Agrawal et al., "Do Language Models Know When They're Hallucinating References?", 2023: https://arxiv.org/abs/2305.18248
OpenAI o3/o4-mini system card (PersonQA rates), Apr 2025: https://cdn.openai.com/pdf/2221c875-02dc-4789-800b-e7758f3722c1/o3-and-o4-mini-system-card.pdf
Vectara HHEM hallucination leaderboard: https://github.com/vectara/hallucination-leaderboard
Anthropic, "Reduce hallucinations" official guidance: https://platform.claude.com/docs/en/test-and-evaluate/strengthen-guardrails/reduce-hallucinations
Weller et al., "According to...", EACL 2024: https://aclanthology.org/2024.eacl-long.140/
"Whispers that Shake Foundations" (false premises), EMNLP 2024: https://aclanthology.org/2024.emnlp-main.155.pdf
Tomani et al. (Meta), "Uncertainty-Based Abstention in LLMs", 2024: https://arxiv.org/pdf/2404.10960
Sanz-Guerrero, Mager, von der Wense, "LLMs Are Overconfident in Their Own Responses" (ownership bias in self-reported confidence), ACL 2026 Findings: https://arxiv.org/abs/2606.03437
Farquhar et al., "Detecting hallucinations using semantic entropy", Nature 2024: https://www.nature.com/articles/s41586-024-07421-0
Huang et al., "Large Language Models Cannot Self-Correct Reasoning Yet", ICLR 2024: https://arxiv.org/abs/2310.01798
CJR Tow Center, "We Compared Eight AI Search Engines. They're All Bad at Citing News", Mar 2025: https://www.cjr.org/tow_center/we-compared-eight-ai-search-engines-theyre-all-bad-at-citing-news.php
Lewis et al., "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks", NeurIPS 2020: https://arxiv.org/abs/2005.11401
Chen, Wang, Wang, "A Dataset for Answering Time-Sensitive Questions", NeurIPS 2021 (datasets track): https://arxiv.org/abs/2108.06314
Sharma et al. (Anthropic), "Towards Understanding Sycophancy in Language Models", ICLR 2024: https://arxiv.org/abs/2310.13548
Gao et al., "PAL: Program-Aided Language Models", ICML 2023: https://arxiv.org/abs/2211.10435
Cheng, Su, Yuan, He, Liu, Tao, Xie, Li, "Chain-of-Thought Prompting Obscures Hallucination Cues in Large Language Models: An Empirical Evaluation", June 2025 (rev. Sept 2025): https://arxiv.org/abs/2506.17088