
Answer-first summary for fast verification
Answer: Utilize Databricks Delta Lake for storing and managing data, leveraging its built-in capabilities for automatic data lineage tracking and metadata management, and use Databricks notebooks for data exploration to enhance discoverability.
Option B is the most effective solution as it leverages Databricks Delta Lake's built-in features for automatic data lineage and metadata management, ensuring scalability and reducing manual effort. The use of Databricks notebooks facilitates data exploration and enhances discoverability without the need for additional tools or significant development resources. This approach aligns with the requirements for scalability, cost-effectiveness, and minimal manual intervention.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
In a Databricks environment, you are tasked with designing a solution for comprehensive data lineage and metadata management to support a data governance initiative. The solution must automatically track data provenance, capture detailed metadata, and ensure high discoverability of data assets across the organization. Additionally, the solution should be scalable, cost-effective, and minimize manual intervention. Considering these requirements, which of the following approaches would BEST meet the organization's needs? Choose one option.
A
Manually document data lineage and metadata in the Databricks UI for each data asset, and create a shared documentation repository for data asset discoverability, relying on team members to update it regularly.
B
Utilize Databricks Delta Lake for storing and managing data, leveraging its built-in capabilities for automatic data lineage tracking and metadata management, and use Databricks notebooks for data exploration to enhance discoverability.
C
Develop a custom solution integrating Databricks notebooks with Apache Atlas for lineage tracking and a separate metadata repository, requiring significant development effort and maintenance.
D
Implement a third-party data catalog solution for lineage and metadata management, integrating it with Databricks for data processing, which may introduce additional licensing costs and integration complexity.