AWS Certified AI Practitioner

Get started today

Ultimate access to all questions.

Explanation:

The correct answer is B: Top K. This parameter specifically controls the number of potential next tokens evaluated during each step of text generation in large language models (LLMs).

Why Top K is correct:

Functionality: Top K limits the sampling pool to only the K most probable next tokens at each generation step. For example, with Top K=50, the model considers only the 50 highest-probability tokens rather than the entire vocabulary.
Purpose: This parameter balances output quality by reducing the chance of selecting low-probability, nonsensical tokens while maintaining some diversity in generation.
Direct relevance: The question specifically asks about controlling the "number of possible next tokens considered" - this is the exact definition of Top K's function in the decoding process.

Why other options are incorrect:

A: Maximum tokens: This controls the total length of generated output (maximum number of tokens in the complete response), not the evaluation of next tokens at each step.
C: Temperature: This parameter adjusts the randomness of token selection by scaling the probability distribution (higher temperature = more random, lower = more deterministic), but it doesn't control how many tokens are considered.
D: Batch size: This is a training/inference optimization parameter that determines how many samples are processed simultaneously, affecting computational efficiency rather than token evaluation during generation.

Best practice context: In AWS AI services like Amazon Bedrock and SageMaker JumpStart, Top K is a standard decoding parameter available when configuring text generation with foundation models. It's particularly useful when you want to constrain generation to more likely tokens while avoiding the complete determinism of greedy decoding (which would always pick the single most likely token).

Explanation:

The correct answer is B: Top K. This parameter specifically controls the number of potential next tokens evaluated during each step of text generation in large language models (LLMs).

Why Top K is correct:

Functionality: Top K limits the sampling pool to only the K most probable next tokens at each generation step. For example, with Top K=50, the model considers only the 50 highest-probability tokens rather than the entire vocabulary.
Purpose: This parameter balances output quality by reducing the chance of selecting low-probability, nonsensical tokens while maintaining some diversity in generation.
Direct relevance: The question specifically asks about controlling the "number of possible next tokens considered" - this is the exact definition of Top K's function in the decoding process.

Why other options are incorrect:

A: Maximum tokens: This controls the total length of generated output (maximum number of tokens in the complete response), not the evaluation of next tokens at each step.
C: Temperature: This parameter adjusts the randomness of token selection by scaling the probability distribution (higher temperature = more random, lower = more deterministic), but it doesn't control how many tokens are considered.
D: Batch size: This is a training/inference optimization parameter that determines how many samples are processed simultaneously, affecting computational efficiency rather than token evaluation during generation.

Comments (0)

No comments yet.

Which parameter in a large language model (LLM) determines the number of potential next tokens evaluated during each step of text generation?

Exam-Like

Last updated: February 8, 2026 at 20:17

Maximum tokens

18.2%

Top K

63.6%

Temperature

Batch size

18.2%