
Ultimate access to all questions.
A Generative AI Engineer is developing an LLM-based application where the document chunks for the retriever have a maximum size of 512 tokens. Given that cost and latency are higher priorities than quality, which available context length level should they select?
A
context length 514; smallest model is 0.44GB and embedding dimension 768
B
context length 2048: smallest model is 11GB and embedding dimension 2560
C
context length 32768: smallest model is 14GB and embedding dimension 4096
D
context length 512: smallest model is 0.13GB and embedding dimension 384