
Explanation:
To handle high volume and velocity data with parallel updates of many tables, the solutions must optimize for parallelism and avoid cloud storage throttling.
Ultimate access to all questions.
Which solution would you implement for a large company requiring near real-time processing of hundreds of pipelines that perform parallel updates on multiple tables handling extremely high-volume, high-velocity data?
A
Use Databricks High Concurrency clusters, which leverage optimized cloud storage connections to maximize data throughput.
B
Partition ingestion tables by a small time duration to allow for many data files to be written in parallel.
C
Configure Databricks to save all data to attached SSD volumes instead of object storage, increasing file I/O significantly.
D
Isolate Delta Lake tables in their own storage containers to avoid API limits imposed by cloud vendors.
E
Store all tables in a single database to ensure that the Databricks Catalyst Metastore can load balance overall throughput.
No comments yet.