Context window - Definition

A context window is the fixed amount of text a model can hold in mind at one time. It is measured in tokens - the small chunks of text a model reads - and it sets a hard ceiling on how much a model can take in and reason over in a single pass. Everything the model needs for an answer, the question, the instructions, and any source material, has to fit inside that window together.

Why long documents overflow it

Context windows have grown, but they are still finite, and real documents are large. A single contract, research paper, or wiki export can run to hundreds of pages - far more tokens than even a generous window holds. When the text won't fit, something has to give: it gets truncated, or it never makes it in at all.

Truncation is quietly dangerous. A model handed only the first half of a document will answer as if that's the whole story, with no signal that the rest existed. The answer reads as confident and complete, which is exactly how hallucinations take hold. Simply pasting an entire library into one prompt is not a real fix either - past a point, models attend less reliably to the middle of a very long window, so relevant detail can be present yet effectively overlooked.

Feeding in only what matters

The durable approach is to stop trying to fit everything. A document is split into passage-sized pieces through chunking, and each piece is indexed so it can be found later by meaning rather than keyword via semantic search. When a question arrives, only the handful of passages that actually bear on it are retrieved and placed in the window - the pattern known as retrieval-augmented generation.

This keeps the window focused: the model sees the most relevant evidence at full fidelity instead of a thin slice of everything. It is how Sidenote reads documents far larger than any single window - pulling the passages that matter into context, then grounding and citing each answer in the exact text it was given.

FAQ

How big is a typical context window?

Current models hold anywhere from tens of thousands of tokens to a million or more, roughly a book's worth of text at the top end. That sounds unlimited; it isn't. Real workloads like a contract set, a wiki export, or a folder of papers still overflow it.

Is a bigger context window always better?

No. Past a point, more context means more noise: loosely relevant text competes with the passage that actually answers the question, and models attend unevenly across very long windows. A focused set of retrieved passages usually beats context stuffing.

What happens when a document doesn't fit?

Something gets left out. Naive tools truncate silently and answer from the part they kept, as if it were the whole document. Retrieval-based tools split the document into chunks and pull in only the passages relevant to each question, so document size stops being the limit.

Why long documents overflow it

Feeding in only what matters

FAQ

How big is a typical context window?

Is a bigger context window always better?

What happens when a document doesn't fit?

Stop digging. Start asking.