AWS Certified AI Practitioner

Get started today

Ultimate access to all questions.

Explanation:

Detailed Explanation

To select an Amazon Bedrock model that produces responses aligned with the preferred writing style of the company's employees, the optimal approach is Option B: Evaluate the models by using a human workforce and custom prompt datasets.

Why Option B is Correct:

Custom Prompt Datasets: The company needs to evaluate models based on their specific use cases and stylistic preferences. Generic or built-in prompt datasets (Option A) won't capture the unique terminology, tone, and style that employees prefer. Custom datasets allow testing with actual prompts the company would use in production, ensuring the evaluation is relevant.
Human Workforce Evaluation: Writing style preferences are inherently subjective and nuanced. Automated metrics alone cannot assess qualitative aspects like tone, formality, clarity, or alignment with company culture. Human evaluators (the company's employees or a designated team) can provide direct feedback on which model's outputs best match their preferred style.
Practical Validation: This approach combines objective testing (with custom datasets) and subjective assessment (human evaluation), providing a comprehensive evaluation that addresses both the functional and stylistic requirements.

Why Other Options Are Less Suitable:

Option A (Built-in prompt datasets): Built-in datasets are generic and not tailored to the company's specific needs. They won't help identify stylistic preferences unique to the organization.
Option C (Public model leaderboards): Public leaderboards typically rank models on general benchmarks (e.g., accuracy, speed) rather than specific stylistic preferences. They don't account for the company's internal use cases or employee preferences.
Option D (InvocationLatency metrics in CloudWatch): This measures performance metrics like latency, not output quality or style. While important for operational efficiency, it doesn't help evaluate whether responses match the preferred writing style.

Best Practice Context:

In AWS AI/ML best practices, evaluating foundation models for specific business use cases involves:

Creating custom evaluation datasets that reflect real-world scenarios
Incorporating human-in-the-loop evaluation for subjective criteria like style, tone, and appropriateness
Iteratively testing multiple models with representative prompts

This approach ensures the selected model not only performs well technically but also meets the organization's specific qualitative requirements.

Explanation:

Detailed Explanation

Why Option B is Correct:

Custom Prompt Datasets: The company needs to evaluate models based on their specific use cases and stylistic preferences. Generic or built-in prompt datasets (Option A) won't capture the unique terminology, tone, and style that employees prefer. Custom datasets allow testing with actual prompts the company would use in production, ensuring the evaluation is relevant.
Human Workforce Evaluation: Writing style preferences are inherently subjective and nuanced. Automated metrics alone cannot assess qualitative aspects like tone, formality, clarity, or alignment with company culture. Human evaluators (the company's employees or a designated team) can provide direct feedback on which model's outputs best match their preferred style.
Practical Validation: This approach combines objective testing (with custom datasets) and subjective assessment (human evaluation), providing a comprehensive evaluation that addresses both the functional and stylistic requirements.

Why Other Options Are Less Suitable:

Option A (Built-in prompt datasets): Built-in datasets are generic and not tailored to the company's specific needs. They won't help identify stylistic preferences unique to the organization.
Option C (Public model leaderboards): Public leaderboards typically rank models on general benchmarks (e.g., accuracy, speed) rather than specific stylistic preferences. They don't account for the company's internal use cases or employee preferences.
Option D (InvocationLatency metrics in CloudWatch): This measures performance metrics like latency, not output quality or style. While important for operational efficiency, it doesn't help evaluate whether responses match the preferred writing style.

Best Practice Context:

In AWS AI/ML best practices, evaluating foundation models for specific business use cases involves:

Creating custom evaluation datasets that reflect real-world scenarios
Incorporating human-in-the-loop evaluation for subjective criteria like style, tone, and appropriateness
Iteratively testing multiple models with representative prompts

This approach ensures the selected model not only performs well technically but also meets the organization's specific qualitative requirements.

Comments (0)

No comments yet.

What should the company do to select an Amazon Bedrock model that produces responses aligned with the preferred writing style of its employees?

Exam-Like

Last updated: June 30, 2026 at 14:03

Evaluate the models by using built-in prompt datasets.

11.1%

Evaluate the models by using a human workforce and custom prompt datasets.

88.9%

Use public model leaderboards to identify the model.

Use the model InvocationLatency runtime metrics in Amazon CloudWatch when trying models.