Context stuffing — Definition

Context stuffing is the practice of pasting an entire document — or as much of it as will fit — into the model's context window before asking a question. The logic is simple: give the model everything and let it find what's relevant.

Why it matters

In small doses, dumping full documents into the prompt can work. For a two-page email thread or a short article, it's often fine. The problems compound as documents grow:

Cost. LLM pricing is roughly proportional to tokens processed. Stuffing a 200-page PDF into every query is an expensive way to answer a simple question.
Context limits. Most models cap at somewhere between 100k and 1M tokens. Many real-world documents — legal agreements, technical manuals, research corpora — exceed those limits, making context stuffing simply impossible.
Accuracy dilution. Counter-intuitively, more context can mean worse answers. When a model is handed hundreds of pages, relevant passages compete with irrelevant ones. The signal-to-noise ratio drops, and the model may anchor on high-frequency or early content rather than the parts that actually answer the question.

The retrieval alternative

Retrieval-augmented generation solves all three problems. Instead of dumping the document in full, top-k retrieval identifies the most relevant passages and feeds only those into the prompt. The model gets focused evidence rather than a haystack; cost scales with the question, not the document size; and no document is too long to handle.

Sidenote uses retrieval by default — every answer is built from the passages most relevant to your question, with each passage traced back to its source via citation.

Why it matters

The retrieval alternative

Stop digging. Start asking.