AWS Certified AI Practitioner

Get started today

Ultimate access to all questions.

Explanation:

Explanation of the Selected Answer

Correct Answer: A (Data augmentation for imbalanced classes)

Why Option A is Optimal

When an AI practitioner discovers that input data is biased, leading to biased image generation in a model that creates images of humans in various professions, the root cause is often imbalanced class representation in the training dataset. For example, if the training data overrepresents certain attributes (like gender, race, or age) for specific professions, the model learns these biased patterns and replicates them in generated images.

Data augmentation for imbalanced classes directly addresses this issue by:

Increasing diversity in underrepresented classes: By applying transformations (e.g., rotations, flips, color adjustments, cropping) to existing images of underrepresented groups, the practitioner can create additional synthetic training examples. This helps balance the dataset without requiring extensive new data collection.
Mitigating attribute-specific bias: If certain professions are associated with specific attributes in the biased data, augmentation can generate varied representations, reducing the model's reliance on spurious correlations.
Improving model fairness and generalization: A more balanced dataset enables the model to learn more equitable representations across all classes, leading to fairer image generation that better reflects real-world diversity.

This approach aligns with AWS AI/ML best practices for addressing bias, which emphasize data preprocessing techniques to ensure training data is representative and balanced before model training.

Why Other Options Are Less Suitable

B (Model monitoring for class distribution): While monitoring is crucial for detecting bias in production, it doesn't solve the underlying problem in the training data. Monitoring identifies issues but doesn't correct biased input data; it's a reactive rather than proactive solution.
C (Retrieval Augmented Generation - RAG): RAG is primarily used for text-based models to enhance responses with external knowledge retrieval. It's not designed to address bias in image generation models or correct imbalanced training data.
D (Watermark detection for images): This technique focuses on identifying watermarks in images for copyright or authenticity purposes. It has no relevance to mitigating bias in training data or improving fairness in image generation models.

Key Consideration

The question specifically asks for a technique to solve bias caused by "specific attributes in the input data." Data augmentation directly targets this by modifying the input data itself to create a more balanced training set, making it the most appropriate and effective solution among the given options.

Explanation:

Explanation of the Selected Answer

Correct Answer: A (Data augmentation for imbalanced classes)

Why Option A is Optimal

Data augmentation for imbalanced classes directly addresses this issue by:

Increasing diversity in underrepresented classes: By applying transformations (e.g., rotations, flips, color adjustments, cropping) to existing images of underrepresented groups, the practitioner can create additional synthetic training examples. This helps balance the dataset without requiring extensive new data collection.
Mitigating attribute-specific bias: If certain professions are associated with specific attributes in the biased data, augmentation can generate varied representations, reducing the model's reliance on spurious correlations.
Improving model fairness and generalization: A more balanced dataset enables the model to learn more equitable representations across all classes, leading to fairer image generation that better reflects real-world diversity.

This approach aligns with AWS AI/ML best practices for addressing bias, which emphasize data preprocessing techniques to ensure training data is representative and balanced before model training.

Why Other Options Are Less Suitable

B (Model monitoring for class distribution): While monitoring is crucial for detecting bias in production, it doesn't solve the underlying problem in the training data. Monitoring identifies issues but doesn't correct biased input data; it's a reactive rather than proactive solution.
C (Retrieval Augmented Generation - RAG): RAG is primarily used for text-based models to enhance responses with external knowledge retrieval. It's not designed to address bias in image generation models or correct imbalanced training data.
D (Watermark detection for images): This technique focuses on identifying watermarks in images for copyright or authenticity purposes. It has no relevance to mitigating bias in training data or improving fairness in image generation models.

Key Consideration

Comments (0)

No comments yet.

Which technique should an AI practitioner use to address bias in a model generating images of humans in various professions, when specific attributes in the input data are causing biased image generation?

Exam-Like

Last updated: February 8, 2026 at 20:17

Data augmentation for imbalanced classes

100.0%

Model monitoring for class distribution

0.0%

Retrieval Augmented Generation (RAG)

Watermark detection for images