
Answer-first summary for fast verification
Answer: Cross-region reads and writes can incur significant costs and latency; whenever possible, compute should be deployed in the same region the data is stored.
The correct answer is C. Databricks compute resources (clusters) should ideally be deployed in the same region as the data storage to minimize cross-region data transfer costs and latency. While the consulting firm is in India, the data resides in US cloud storage. Cross-region operations between compute (e.g., India) and storage (US) would incur unnecessary network charges and slower performance. Option A incorrectly references HDFS, which is not used by Databricks for cloud storage. Option B is false because workspaces and compute are region-specific. Option D misrepresents security concerns, as code execution is managed securely within the cloud environment, and latency for end users (India) is secondary to data-compute proximity for pipeline efficiency.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
A US-based company has engaged an Indian consulting firm to develop new data engineering pipelines for AI applications. The company's data resides in US regional cloud storage.
Considering all data governance requirements are met, where should the Databricks workspace for the contractors be deployed?
A
Databricks runs HDFS on cloud volume storage; as such, cloud virtual machines must be deployed in the region where the data is stored.
B
Databricks workspaces do not rely on any regional infrastructure; as such, the decision should be made based upon what is most convenient for the workspace administrator.
C
Cross-region reads and writes can incur significant costs and latency; whenever possible, compute should be deployed in the same region the data is stored.
D
Databricks notebooks send all executable code from the user’s browser to virtual machines over the open internet; whenever possible, choosing a workspace region near the end users is the most secure.