AWS Certified Cloud Practitioner

Get started today

Ultimate access to all questions.

Deep dive into the quiz with AI chat providers.

We prepare a focused prompt with your quiz and certificate details so each AI can offer a more tailored, in-depth explanation.

An AI practitioner wants to use a foundation model (FM) to design a search application. The search application must handle queries that have text and images. Which type of FM should the AI practitioner use to power the search application?

Exam-Like

Community

RRitesh

Last updated: December 8, 2025 at 19:13

Multi-modal embedding model

Text embedding model

Multi-modal generation model

Image generation model

Explanation:

Explanation

Why A is correct:

A multi-modal embedding model is specifically designed to handle multiple types of data (text, images, audio, etc.) and convert them into vector embeddings in a shared semantic space.
For a search application that needs to handle both text and image queries, a multi-modal embedding model can create embeddings for both modalities that are comparable, enabling cross-modal search capabilities.
This allows users to search with text and find relevant images, or search with images and find relevant text content.

Why other options are incorrect:

B. Text embedding model: Only handles text data and cannot process or understand image content.
C. Multi-modal generation model: While it can handle multiple modalities, it's designed for generation tasks (creating content) rather than search/retrieval tasks.
D. Image generation model: Only handles image generation and cannot process text queries or create comparable embeddings for search.

Key Concept: Multi-modal embedding models create vector representations of different data types in a shared space, enabling cross-modal similarity search and retrieval.

Powered ByGPT-5.2

Comments

Loading comments...