
Answer-first summary for fast verification
Answer: The model truncates or ignores excess tokens
## Explanation When a model has a fixed context window of 8,000 tokens and receives 10,000 tokens as input, the model cannot process all tokens simultaneously due to its architectural constraints. Here's what happens: 1. **Context Window Limitation**: The context window represents the maximum number of tokens the model can process at once. This is a fixed architectural constraint determined during model training. 2. **Truncation Process**: The model will truncate the input by removing excess tokens beyond its 8,000-token limit. Typically, this involves: - Keeping the most recent 8,000 tokens (if using a sliding window approach) - Or keeping the first 8,000 tokens and discarding the rest - Or using a specific truncation strategy defined by the model implementation 3. **Why Not Other Options**: - **A (Automatic expansion)**: Models cannot automatically expand their context windows as this is a fundamental architectural constraint that requires retraining or specialized techniques. - **C (Compression into embeddings)**: While embeddings represent tokens, they don't compress token count - each token still requires its own embedding representation. - **D (Permanent failure)**: Models don't fail permanently; they simply process what they can within their constraints. 4. **Real-world Implications**: This truncation can lead to loss of important context, especially for long documents or conversations. Some advanced techniques like chunking or hierarchical processing can help mitigate this limitation. **Correct Answer: B** - The model truncates or ignores excess tokens.
Author: Jin H
Ultimate access to all questions.
No comments yet.
If a model has a context window of 8,000 tokens, what happens when the user inputs 10,000 tokens?
A
The model automatically expands its context window
B
The model truncates or ignores excess tokens
C
The model compresses input into embeddings to fit
D
The model fails permanently