Top-k retrieval is the step in a retrieval pipeline where the system selects the k highest-scoring chunks to pass to the model — typically after semantic search has ranked them by relevance — and discards the rest.
Why it matters
A language model has a finite context window: there's a ceiling on how many words can be placed in front of it at once, and crowding that window with loosely-related passages hurts answer quality. When a model has to sift through hundreds of chunks to find the one that answers the question, it becomes harder to constrain the answer to real source text and easier for irrelevant material to dilute or distort the response.
Top-k retrieval is the fix. By selecting only the k passages most likely to answer the query — k is commonly 3 to 20, depending on passage length and window size — the pipeline passes a tight, focused set of evidence to the model. That makes the answer more likely to be grounded in the passages that matter, and makes every citation easier to verify: you're looking at a small, purposeful set of sources rather than a sprawling dump.
The value of k is a trade-off. Too small and the system risks omitting a passage that contains the answer; too large and the context fills up with noise. In retrieval-augmented generation, getting k right is one of the practical levers that separates answers that consistently cite the right sentence from answers that are merely plausible.