LLM Temperature Explained

Temperature is a hyperparameter that controls the randomness of a large language model's output during text generation. Adjusting it lets you tune the model's behavior anywhere from highly deterministic and focused to creative and unpredictable.

What Is Temperature?

Temperature is a scalar value — typically ranging from 0.0 to 2.0 — applied during the token-sampling step of LLM inference. It reshapes the probability distribution over the model's vocabulary before the next token is selected. A lower value makes the distribution sharper (peakier), while a higher value flattens it, spreading probability mass across more tokens.

How It Works Under the Hood

After the model computes raw logits (unnormalized scores) for every possible next token, those logits are divided by the temperature value before being passed through a softmax function. Dividing by a number less than 1 amplifies differences between scores, making the highest-probability token even more dominant. Dividing by a number greater than 1 compresses differences, giving lower-probability tokens a more competitive chance of being selected.

Temperature = 0: Greedy Decoding

Setting temperature to 0 (or near 0) causes the model to always select the single highest-probability token at each step, a strategy called greedy decoding. This produces fully deterministic, consistent, and focused output — ideal for factual Q&A, code generation, or structured data extraction. The trade-off is reduced variety; the model will give the same answer every time for the same input.

High Temperature: Creativity and Risk

Values above 1.0 introduce significant randomness, encouraging the model to explore unlikely tokens and produce more surprising, diverse outputs. This is useful for creative writing, brainstorming, or generating varied training data. However, very high temperatures (above ~1.5) can cause the model to produce incoherent, grammatically broken, or factually wrong text — often called the model 'hallucinating wildly'.

Choosing the Right Temperature

A temperature between 0.2 and 0.5 suits tasks requiring accuracy and consistency, such as coding assistants or customer-support bots. Values between 0.7 and 1.0 work well for conversational agents that need to feel natural but still coherent. For purely creative tasks like poetry or story generation, experimenting in the 1.0–1.4 range is reasonable, always validated against output quality.

Key Gotcha: Temperature Interacts With Other Sampling Parameters

Temperature does not work in isolation — it interacts with top-p (nucleus sampling) and top-k sampling, which further restrict which tokens are eligible before the final random draw. A common mistake is setting a high temperature while also using a very low top-p, effectively canceling out the diversity gain. Always tune these parameters together and test outputs empirically rather than adjusting temperature alone.

Go deeper with an AI tutor that teaches this in context — and quizzes you on it.

Open the app — free to start