AWS Certified AI Practitioner

Get started today

Ultimate access to all questions.

Explanation:

Analysis of Amazon SageMaker Inference Options

Based on the scenario described, the company needs to perform inference on large archived datasets (multiple GBs in size) and does not require immediate access to predictions. Let's evaluate each SageMaker inference option against these requirements:

Option A: Batch Transform ✅

Batch Transform is specifically designed for processing large volumes of data stored in Amazon S3. It processes data in batches without requiring a persistent endpoint, making it ideal for:

Large datasets: Can handle multiple GBs efficiently
Archived data: Works directly with data stored in S3
Non-real-time requirements: Processes data offline and stores results in S3 when complete
Cost-effective: No endpoint running costs; you only pay for the compute used during processing

This aligns perfectly with the company's requirements for analyzing archived data without immediate prediction needs.

Option B: Real-time Inference ❌

Real-time Inference requires a persistent endpoint that serves predictions with low latency (typically milliseconds). This is unsuitable because:

Designed for immediate prediction needs
Requires data to be sent via API calls, not ideal for multiple GBs of archived data
Higher operational costs due to continuously running endpoint
Not optimized for batch processing of large datasets

Option C: Serverless Inference ❌

Serverless Inference provides automatic scaling with no infrastructure management, but:

Primarily designed for real-time or near-real-time predictions
Pay-per-request model not optimized for processing multiple GBs of data
Better suited for sporadic or unpredictable workloads requiring immediate responses
Not the most efficient choice for large batch processing of archived data

Option D: Asynchronous Inference ❌

Asynchronous Inference queues requests and processes them asynchronously, which:

Supports larger payloads (up to 1GB) than real-time inference
Provides near real-time latency (seconds to minutes)
Still requires an endpoint and is better for individual large requests rather than processing entire archived datasets
More suitable when you need results within minutes rather than hours/days
Less efficient than Batch Transform for purely offline processing of multiple GBs

Conclusion

Batch Transform (Option A) is the optimal choice because it's specifically engineered for offline batch processing of large datasets stored in S3. It provides the most cost-effective, scalable solution when immediate predictions aren't required, making it ideal for analyzing archived data in bulk. The other options are designed for scenarios requiring faster response times or different workload patterns.

Explanation:

Analysis of Amazon SageMaker Inference Options

Option A: Batch Transform ✅

Batch Transform is specifically designed for processing large volumes of data stored in Amazon S3. It processes data in batches without requiring a persistent endpoint, making it ideal for:

Large datasets: Can handle multiple GBs efficiently
Archived data: Works directly with data stored in S3
Non-real-time requirements: Processes data offline and stores results in S3 when complete
Cost-effective: No endpoint running costs; you only pay for the compute used during processing

This aligns perfectly with the company's requirements for analyzing archived data without immediate prediction needs.

Option B: Real-time Inference ❌

Real-time Inference requires a persistent endpoint that serves predictions with low latency (typically milliseconds). This is unsuitable because:

Designed for immediate prediction needs
Requires data to be sent via API calls, not ideal for multiple GBs of archived data
Higher operational costs due to continuously running endpoint
Not optimized for batch processing of large datasets

Option C: Serverless Inference ❌

Serverless Inference provides automatic scaling with no infrastructure management, but:

Primarily designed for real-time or near-real-time predictions
Pay-per-request model not optimized for processing multiple GBs of data
Better suited for sporadic or unpredictable workloads requiring immediate responses
Not the most efficient choice for large batch processing of archived data

Option D: Asynchronous Inference ❌

Asynchronous Inference queues requests and processes them asynchronously, which:

Supports larger payloads (up to 1GB) than real-time inference
Provides near real-time latency (seconds to minutes)
Still requires an endpoint and is better for individual large requests rather than processing entire archived datasets
More suitable when you need results within minutes rather than hours/days
Less efficient than Batch Transform for purely offline processing of multiple GBs

Conclusion

Comments (0)

No comments yet.

Which Amazon SageMaker inference option is appropriate for performing inference on large archived datasets of multiple GBs in size when real-time access to predictions is not required?

Exam-Like

Last updated: May 15, 2026 at 14:02

Batch transform

75.0%

Real-time inference

0.0%

Serverless inference

Asynchronous inference