
Answer-first summary for fast verification
Answer: Multi-modal embedding model
## Detailed Explanation For a search application that must handle queries containing both text and images, the appropriate foundation model is a **multi-modal embedding model**. Here's the reasoning: ### Why Multi-modal Embedding Model is Correct 1. **Purpose Alignment**: Embedding models are specifically designed for representation learning and similarity search applications. They convert input data (text, images, audio, etc.) into numerical vectors (embeddings) that capture semantic meaning in a shared vector space. 2. **Multi-modal Capability**: Multi-modal embedding models can process and understand multiple types of data simultaneously. They create embeddings for both text and images in the same vector space, enabling cross-modal search functionality. 3. **Search Application Suitability**: For a search application, the primary requirement is to retrieve relevant content based on queries. Embedding models excel at this by: - Converting queries (text and images) into embeddings - Comparing these embeddings against indexed content embeddings - Returning the most similar results based on vector similarity metrics 4. **AWS Context**: In AWS AI/ML services, models like Amazon Titan Multi-modal Embeddings are specifically designed for such use cases, allowing developers to build applications that can search across different data types. ### Analysis of Other Options **B. Text Embedding Model**: - Only processes text data - Cannot handle image queries - Would fail to process half of the required query types **C. Multi-modal Generation Model**: - Designed for content creation (generating text, images, etc.) - Not optimized for search and retrieval tasks - Generation models focus on producing new content rather than finding existing content **D. Image Generation Model**: - Only generates images - Cannot process text queries - Completely unsuitable for a search application ### Key Distinction: Embedding vs. Generation Models The critical distinction is between **embedding models** (for search/retrieval) and **generation models** (for content creation). For any search application, regardless of modality, embedding models are the appropriate choice because they transform data into a format suitable for similarity comparison and retrieval. ### Best Practice Considerations 1. **Vector Search Architecture**: Multi-modal embedding models enable vector search architectures where both text and image queries can be compared against a unified index. 2. **Scalability**: Embedding-based search systems are highly scalable and efficient for production applications. 3. **Accuracy**: Multi-modal embeddings capture semantic relationships across different data types, improving search relevance for mixed queries. Therefore, a multi-modal embedding model is the optimal choice for powering a search application that must handle both text and image queries.
Ultimate access to all questions.
No comments yet.
Author: LeetQuiz Editorial Team
An AI practitioner is designing a search application that must process queries containing both text and images. Which type of foundation model (FM) is appropriate for powering this application?
A
Multi-modal embedding model
B
Text embedding model
C
Multi-modal generation model
D
Image generation model