
Answer-first summary for fast verification
Answer: Dataflow: A fully managed service that provides a unified model for batch and streaming data processing, offering auto-scaling, fault tolerance, and minimal operational overhead., Pub/Sub: A messaging service for event ingestion and delivery at scale, but requires additional services to process and analyze the data in real-time.
Dataflow is the correct choice because it is specifically designed for building and managing both batch and streaming data pipelines with features like auto-scaling and fault tolerance, making it ideal for real-time data processing. Pub/Sub is also a correct choice for event ingestion at scale, but it requires integration with other services like Dataflow for processing, hence it's the second correct option. BigQuery is not suitable for real-time processing, Cloud Storage lacks processing capabilities, and Dataproc is optimized for batch processing, not streaming.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
In the context of designing a real-time analytics platform for a global e-commerce website, which Google Cloud service is the most suitable for creating and managing a high-throughput, low-latency streaming data pipeline that can process millions of events per second with minimal operational overhead? Consider the need for auto-scaling, fault tolerance, and the ability to handle both batch and streaming data within the same pipeline. Choose the best option from the following:
A
BigQuery: A serverless data warehouse optimized for running fast SQL queries on large datasets, but not designed for real-time data processing.
B
Cloud Storage: A highly durable and available object storage service, ideal for storing large amounts of unstructured data but lacks the capabilities for real-time data processing.
C
Dataflow: A fully managed service that provides a unified model for batch and streaming data processing, offering auto-scaling, fault tolerance, and minimal operational overhead.
D
Dataproc: A managed service for running Apache Spark and Hadoop clusters, primarily designed for batch processing and not optimized for low-latency streaming data pipelines.
E
Pub/Sub: A messaging service for event ingestion and delivery at scale, but requires additional services to process and analyze the data in real-time.