Ultimate access to all questions.
Upgrade Now 🚀
Sign in to unlock AI tutor
Which solution would you implement for a large company requiring near real-time processing of hundreds of pipelines that perform parallel updates on multiple tables handling extremely high-volume, high-velocity data?
A
Use Databricks High Concurrency clusters, which leverage optimized cloud storage connections to maximize data throughput.
B
Partition ingestion tables by a small time duration to allow for many data files to be written in parallel.
C
Configure Databricks to save all data to attached SSD volumes instead of object storage, increasing file I/O significantly.
D
Isolate Delta Lake tables in their own storage containers to avoid API limits imposed by cloud vendors.
E
Store all tables in a single database to ensure that the Databricks Catalyst Metastore can load balance overall throughput.