
Answer-first summary for fast verification
Answer: context length 512: smallest model is 0.13GB and embedding dimension 384
The question specifies that cost and latency are more important than quality, with document chunks limited to 512 tokens. Option D has a context length of 512 (matching the chunk size), the smallest model size (0.13GB), and the smallest embedding dimension (384), making it the most cost-effective and lowest-latency choice. The community discussion strongly supports D (83% consensus, multiple upvoted comments), noting it aligns with official Databricks guidance and efficiently balances resource usage for the given priorities. Options A, B, and C have larger context lengths, model sizes, or embedding dimensions, increasing cost and latency without benefit since chunks are fixed at 512 tokens.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
A Generative AI Engineer is developing an LLM-based application where the document chunks for the retriever have a maximum size of 512 tokens. Given that cost and latency are higher priorities than quality, which available context length level should they select?
A
context length 514; smallest model is 0.44GB and embedding dimension 768
B
context length 2048: smallest model is 11GB and embedding dimension 2560
C
context length 32768: smallest model is 14GB and embedding dimension 4096
D
context length 512: smallest model is 0.13GB and embedding dimension 384
No comments yet.