AWS Certified AI Practitioner

Get started today

Ultimate access to all questions.

Explanation:

Detailed Explanation

For a search application that must handle queries containing both text and images, the appropriate foundation model is a multi-modal embedding model. Here's the reasoning:

Why Multi-modal Embedding Model is Correct

Purpose Alignment: Embedding models are specifically designed for representation learning and similarity search applications. They convert input data (text, images, audio, etc.) into numerical vectors (embeddings) that capture semantic meaning in a shared vector space.
Multi-modal Capability: Multi-modal embedding models can process and understand multiple types of data simultaneously. They create embeddings for both text and images in the same vector space, enabling cross-modal search functionality.
Search Application Suitability: For a search application, the primary requirement is to retrieve relevant content based on queries. Embedding models excel at this by:
- Converting queries (text and images) into embeddings
- Comparing these embeddings against indexed content embeddings
- Returning the most similar results based on vector similarity metrics
AWS Context: In AWS AI/ML services, models like Amazon Titan Multi-modal Embeddings are specifically designed for such use cases, allowing developers to build applications that can search across different data types.

Analysis of Other Options

B. Text Embedding Model:

Only processes text data
Cannot handle image queries
Would fail to process half of the required query types

C. Multi-modal Generation Model:

Designed for content creation (generating text, images, etc.)
Not optimized for search and retrieval tasks
Generation models focus on producing new content rather than finding existing content

D. Image Generation Model:

Only generates images
Cannot process text queries
Completely unsuitable for a search application

Key Distinction: Embedding vs. Generation Models

The critical distinction is between embedding models (for search/retrieval) and generation models (for content creation). For any search application, regardless of modality, embedding models are the appropriate choice because they transform data into a format suitable for similarity comparison and retrieval.

Best Practice Considerations

Vector Search Architecture: Multi-modal embedding models enable vector search architectures where both text and image queries can be compared against a unified index.
Scalability: Embedding-based search systems are highly scalable and efficient for production applications.
Accuracy: Multi-modal embeddings capture semantic relationships across different data types, improving search relevance for mixed queries.

Therefore, a multi-modal embedding model is the optimal choice for powering a search application that must handle both text and image queries.

Explanation:

Detailed Explanation

For a search application that must handle queries containing both text and images, the appropriate foundation model is a multi-modal embedding model. Here's the reasoning:

Why Multi-modal Embedding Model is Correct

Purpose Alignment: Embedding models are specifically designed for representation learning and similarity search applications. They convert input data (text, images, audio, etc.) into numerical vectors (embeddings) that capture semantic meaning in a shared vector space.
Multi-modal Capability: Multi-modal embedding models can process and understand multiple types of data simultaneously. They create embeddings for both text and images in the same vector space, enabling cross-modal search functionality.
Search Application Suitability: For a search application, the primary requirement is to retrieve relevant content based on queries. Embedding models excel at this by:
- Converting queries (text and images) into embeddings
- Comparing these embeddings against indexed content embeddings
- Returning the most similar results based on vector similarity metrics
AWS Context: In AWS AI/ML services, models like Amazon Titan Multi-modal Embeddings are specifically designed for such use cases, allowing developers to build applications that can search across different data types.

Analysis of Other Options

B. Text Embedding Model:

Only processes text data
Cannot handle image queries
Would fail to process half of the required query types

C. Multi-modal Generation Model:

Designed for content creation (generating text, images, etc.)
Not optimized for search and retrieval tasks
Generation models focus on producing new content rather than finding existing content

D. Image Generation Model:

Only generates images
Cannot process text queries
Completely unsuitable for a search application

Key Distinction: Embedding vs. Generation Models

Best Practice Considerations

Vector Search Architecture: Multi-modal embedding models enable vector search architectures where both text and image queries can be compared against a unified index.
Scalability: Embedding-based search systems are highly scalable and efficient for production applications.
Accuracy: Multi-modal embeddings capture semantic relationships across different data types, improving search relevance for mixed queries.

Therefore, a multi-modal embedding model is the optimal choice for powering a search application that must handle both text and image queries.

Comments (0)

No comments yet.

An AI practitioner is designing a search application that must process queries containing both text and images. Which type of foundation model (FM) is appropriate for powering this application?

Exam-Like

Last updated: February 8, 2026 at 20:17

Multi-modal embedding model

85.7%

Text embedding model

0.0%

Multi-modal generation model

14.3%