
Ultimate access to all questions.
A company uses Amazon SageMaker for its ML pipeline in a production environment. The company has large input data sizes up to 1 GB and processing times up to 1 hour. The company needs near real-time latency. Which SageMaker inference option meets these requirements?
Explanation:
Correct Answer: C. Asynchronous inference
Large Input Data (up to 1 GB): Asynchronous inference is designed to handle large payloads (up to 1 GB) that exceed the typical limits of real-time inference (usually up to 6 MB).
Long Processing Times (up to 1 hour): Real-time inference typically has timeout limits of 60 seconds, while asynchronous inference can handle processing times of up to 15 minutes (with the ability to extend for longer-running jobs).
Near Real-time Latency Requirement: Asynchronous inference provides near real-time results by queuing requests and processing them asynchronously, then delivering results via Amazon S3 or Amazon SNS notifications.
A. Real-time inference: Limited to smaller payloads (typically up to 6 MB) and shorter processing times (usually 60-second timeout). Cannot handle 1 GB files or 1-hour processing times.
B. Serverless inference: Similar limitations to real-time inference in terms of payload size and processing time constraints.
D. Batch transform: Designed for offline processing of large datasets, not for near real-time requirements. It processes data in batches with no latency guarantees.
This solution allows the company to process large ML workloads while maintaining near real-time responsiveness through asynchronous processing patterns.