
Answer-first summary for fast verification
Answer: Configure a transient cluster that runs the primary node and core nodes on On-Demand Instances and the task nodes on Spot Instances.
## Explanation **Correct Answer: B** **Why Option B is correct:** 1. **Transient cluster** - Since the workload runs only 6 hours per day, a transient (short-lived) cluster is more cost-effective than a long-running cluster that would be idle for 18 hours daily. 2. **Primary node and core nodes on On-Demand Instances** - The primary node (master node) should always be on On-Demand to ensure cluster stability and prevent interruption. Core nodes store HDFS data, so they should also be on On-Demand Instances to prevent data loss if Spot Instances are reclaimed. 3. **Task nodes on Spot Instances** - Task nodes are stateless and only perform computation, making them ideal for Spot Instances to maximize cost savings. **Why other options are incorrect:** - **Option A**: Long-running cluster is wasteful for a 6-hour daily workload. - **Option C**: Putting core nodes on Spot Instances risks data loss since core nodes store HDFS data. - **Option D**: Long-running cluster is inefficient, and putting core nodes on Spot Instances risks data loss. **Key AWS Concepts:** - **EMR Node Types**: - **Primary Node**: Manages the cluster - should be On-Demand for reliability - **Core Nodes**: Run tasks and store data in HDFS - should be On-Demand to prevent data loss - **Task Nodes**: Run tasks only (stateless) - ideal for Spot Instances - **Transient vs Long-running**: Transient clusters are terminated after job completion, saving costs for intermittent workloads. - **Spot Instances**: Up to 90% cheaper but can be interrupted with 2-minute warning. This configuration balances cost optimization with data integrity requirements.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
A company has a large data workload that runs for 6 hours each day. The company cannot lose any data while the process is running. A solutions architect is designing an Amazon EMR cluster configuration to support this critical data workload.
Which solution will meet these requirements MOST cost-effectively?
A
Configure a long-running cluster that runs the primary node and core nodes on On-Demand Instances and the task nodes on Spot Instances.
B
Configure a transient cluster that runs the primary node and core nodes on On-Demand Instances and the task nodes on Spot Instances.
C
Configure a transient cluster that runs the primary node on an On-Demand Instance and the core nodes and task nodes on Spot Instances.
D
Configure a long-running cluster that runs the primary node on an On-Demand Instance, the core nodes on Spot Instances, and the task nodes on Spot Instances.