
Ultimate access to all questions.
In a Databricks environment, you are tasked with designing a solution for comprehensive data lineage and metadata management to support a data governance initiative. The solution must automatically track data provenance, capture detailed metadata, and ensure high discoverability of data assets across the organization. Additionally, the solution should be scalable, cost-effective, and minimize manual intervention. Considering these requirements, which of the following approaches would BEST meet the organization's needs? Choose one option.
A
Manually document data lineage and metadata in the Databricks UI for each data asset, and create a shared documentation repository for data asset discoverability, relying on team members to update it regularly.
B
Utilize Databricks Delta Lake for storing and managing data, leveraging its built-in capabilities for automatic data lineage tracking and metadata management, and use Databricks notebooks for data exploration to enhance discoverability.
C
Develop a custom solution integrating Databricks notebooks with Apache Atlas for lineage tracking and a separate metadata repository, requiring significant development effort and maintenance.
D
Implement a third-party data catalog solution for lineage and metadata management, integrating it with Databricks for data processing, which may introduce additional licensing costs and integration complexity.