Ultimate access to all questions.
Upgrade Now 🚀
Sign in to unlock AI tutor
When designing a data quality framework within a lakehouse architecture, which approach ensures comprehensive data validation, error handling, and cleansing without introducing significant processing overhead?
A
Develop custom Spark jobs that run periodically to validate data quality, applying fixes in batches to reduce impact on primary workloads.
B
Embed data quality checks into streaming ingestion processes, using side outputs to capture and manage errors.
C
Integrate a third-party data quality tool directly into the data ingestion pipeline, performing real-time validation and cleansing.
D
Utilize declarative data quality rules within Delta Lake, enforcing constraints at the point of ingestion.