
Answer-first summary for fast verification
Answer: Partition ingestion tables by a small time duration to allow for many data files to be written in parallel., Isolate Delta Lake tables in their own storage containers to avoid API limits imposed by cloud vendors.
To handle high volume and velocity data with parallel updates of many tables, the solutions must optimize for parallelism and avoid cloud storage throttling. - **B** is correct because partitioning tables by small time durations allows parallel writes across many files, enhancing throughput. - **D** is correct as isolating each Delta Lake table in its own storage container prevents hitting cloud vendor API limits per container, ensuring scalability. High Concurrency clusters (A) are more suited for multi-user SQL workloads, not necessarily automated pipelines. Using SSDs (C) isn't feasible for cloud storage at scale. Storing tables in a single database (E) risks metastore bottlenecks.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
Which solution would you implement for a large company requiring near real-time processing of hundreds of pipelines that perform parallel updates on multiple tables handling extremely high-volume, high-velocity data?
A
Use Databricks High Concurrency clusters, which leverage optimized cloud storage connections to maximize data throughput.
B
Partition ingestion tables by a small time duration to allow for many data files to be written in parallel.
C
Configure Databricks to save all data to attached SSD volumes instead of object storage, increasing file I/O significantly.
D
Isolate Delta Lake tables in their own storage containers to avoid API limits imposed by cloud vendors.
E
Store all tables in a single database to ensure that the Databricks Catalyst Metastore can load balance overall throughput.