
Answer-first summary for fast verification
Answer: Adopt Dataflow FlexRS
The optimal solution is to use Cloud Dataflow Flexible Resource Scheduling (FlexRS). FlexRS is designed to lower the costs of batch processing by employing scheduling strategies and a mix of preemptible and regular VMs. While Dataflow Shuffle can speed up batch job execution, it doesn't necessarily cut costs. The Streaming Engine is tailored for stream processing, not batch. Opting for a different Apache Beam runner, such as Apache Flink on Compute Engine, would introduce additional management complexity. For more details, visit [Cloud Dataflow FlexRS](https://cloud.google.com/dataflow/docs/guides/flexrs).
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
Your company's CTO is worried about the high costs associated with running data pipelines, particularly large batch processing jobs. These jobs don't need to run on a strict schedule, and the CTO is open to longer completion times if it means reducing expenses. You're currently using Cloud Dataflow for most pipelines and want to minimize costs without extensive changes. What's your best recommendation?
A
Switch to a different Apache Beam Runner
B
Implement Dataflow Shuffle
C
Adopt Dataflow FlexRS
D
Utilize Dataflow Streaming Engine