Explanation
Temperature and top-p (nucleus sampling) are both hyperparameters that control the randomness and creativity of language model outputs in Amazon Bedrock:
Temperature:
- Controls the randomness of predictions by scaling the logits before applying softmax
- Higher temperature (e.g., 0.8-1.0) increases randomness and creativity
- Lower temperature (e.g., 0.1-0.3) makes outputs more deterministic and focused
Top-p (Nucleus Sampling):
- Filters the vocabulary to the smallest set of tokens whose cumulative probability exceeds p
- Only considers tokens from this "nucleus" for sampling
- Helps maintain coherence while allowing diversity
How They Work Together:
- Complementary Controls: Both parameters influence output diversity and creativity
- Sequential Application: Temperature is applied first to scale logits, then top-p filters the probability distribution
- Tuning Strategy: They can be tuned together to balance creativity and coherence
- Optimal Output Diversity: The combination allows fine-grained control over how "creative" vs. "focused" the model should be
Why Other Options Are Incorrect:
- B: Top-p does not affect latency only; it's a sampling parameter that affects output quality
- C: Temperature doesn't control grammar, and top-p doesn't control sentence length
- D: Neither parameter reduces vocabulary size; they control sampling from the existing vocabulary
Best Practice:
For most use cases, start with moderate values (temperature ~0.7, top-p ~0.9) and adjust based on the desired balance between creativity and coherence for your specific application.