Extractive summarization — Definition

Extractive summarization builds a summary by selecting and stitching together verbatim sentences from the source document — nothing is rewritten. What you get is a set of original sentences ranked by importance, assembled into a shorter whole.

Why it matters

Because every sentence in an extractive summary was lifted word-for-word from the source, it's trivially auditable. You can match each output sentence back to its origin and know the model didn't invent or distort anything. That faithfulness makes it a natural companion to source-grounding and citation: the summary already is the source, just filtered.

The trade-off is readability. Original sentences were written to sit inside paragraphs, so yanking them out and stitching them together often produces something that feels clunky or disjointed — pronouns without antecedents, abrupt topic shifts, repeated framing phrases. You get accuracy at the cost of prose quality.

Extractive summarization also can't bridge across ideas or compress them: if the document's key insight requires combining three scattered sentences, extractive methods may miss it entirely or over-represent repetitive detail near the top.

How it compares

Abstractive summarization takes the opposite approach — it rewrites the source in fresh language, reading more naturally but introducing the risk that the model departs from the original meaning. In practice, document AI often blends both: extraction identifies the most relevant passages, then a generative step tightens the language while citations keep the output traceable to real source text.

Sidenote's summaries are built on retrieved passages from your document, so every claim can be matched back to a source sentence.

Why it matters

How it compares

Stop digging. Start asking.