
Explanation:
Correct Option: C. Dataproc
Dataproc is the best choice for this scenario because it is Google Cloud's fully managed service specifically designed for Apache Spark and Hadoop, making it ideal for large-scale data processing jobs. It offers dynamic scaling to handle varying workloads, cost controls to manage expenses effectively, and seamless integration with other Google Cloud services like BigQuery and Cloud Storage for analytics and storage. These features align perfectly with the requirements of processing large-scale transaction data in real-time and batch modes while minimizing operational overhead.
Why other options are not correct:
Ultimate access to all questions.
You are tasked with designing a data processing solution for a financial services company that requires processing large-scale transaction data in real-time and batch modes using Apache Spark. The solution must be cost-effective, scalable, and fully managed to minimize operational overhead. Additionally, it should seamlessly integrate with other Google Cloud services for analytics and storage. Which Google Cloud service is the BEST choice for this scenario, and why? Choose the most appropriate option.
A
Cloud Composer: A workflow orchestration service that manages workflows across Google Cloud and hybrid environments, but not optimized for direct data processing with Apache Spark.
B
BigQuery: A serverless, highly scalable data warehouse that excels in running fast SQL queries but is not designed for executing Apache Spark jobs.
C
Dataproc: A fully managed service for running Apache Spark and Hadoop clusters, offering dynamic scaling, cost controls, and seamless integration with Google Cloud services.
D
Cloud Dataflow: A unified stream and batch data processing service based on Apache Beam, which is less optimal for large-scale batch processing compared to Dataproc.
No comments yet.