Temperature is the dial that controls how much randomness a language model introduces when picking its next token. At temperature zero, the model always picks the single most probable token — output is deterministic and maximally conservative. As temperature rises toward one and beyond, it draws from a wider range of less probable tokens, producing text that is more varied, surprising, and creative — but also less anchored to the evidence in front of it.
Why it matters
The name is borrowed from physics: in thermodynamics, higher temperature means more energy and more disorder. The analogy holds. A high-temperature model is more likely to drift from the source material, introduce detail that wasn't there, or follow an interesting tangent rather than the most faithful reading of the text.
For most document AI tasks — summarisation, Q&A, extraction — that drift is a problem, not a feature. When the goal is a citable answer that faithfully reflects what a document actually says, you want the model focused and repeatable, not creative. That means running inference at a low temperature so the output tracks the source rather than embellishes it.
This is one of the configuration choices Sidenote makes on your behalf. Answers are generated at a low temperature, which is why they read as conservative rather than inventive — and why each claim can be pinned back to the exact passage it came from. Creativity is not the point; accuracy is.