Knowledge cutoff is the date at which a model's training data ends. Everything the model knows, it learned from text that existed before that date. Events, papers, laws, products, and facts that emerged afterward are simply absent from the model's weights — it has no way to know what it does not know, so it may answer questions about post-cutoff events with confident-sounding guesses drawn from whatever it last saw.
Why it matters
The cutoff creates a silent staleness problem. A model trained through a given date will not tell you "I don't know about that yet" — it will often answer anyway, drawing on older patterns that may no longer apply. The gap between training cutoff and actual deployment can be six months to a year; by the time users interact with a model, the cutoff may already be well in the past.
For documents, this is decisive. A research paper published last month, a contract updated last week, a regulatory guidance issued yesterday — none of these are in any model's training data. The model cannot have learned them, no matter how capable it is.
This is the core reason source-grounding exists. Rather than asking the model to recall something it may have never seen — or to confabulate something plausible in its place — you supply the actual text at query time. Retrieval-augmented generation places the relevant passages directly in front of the model, so it is reading the real document, not guessing from memory. The knowledge cutoff becomes irrelevant for the content you care about: the document is always current, and the model is always reading from it. See how this works in practice on the citations feature page.