
Ultimate access to all questions.
Answer-first summary for fast verification
Answer: Develop custom Spark jobs that run periodically to validate data quality, applying fixes in batches to reduce impact on primary workloads.
**A (Periodic batch Spark jobs for validation/cleansing):** This approach reduces load on ingestion pipelines by offloading validation to batch jobs, minimizing impact on primary workloads. It can be comprehensive, as batch jobs can be complex and thorough. This matches the requirement for minimal processing overhead. **B (Embed quality checks in streaming with side outputs for errors):** Similar to integrating a third-party tool, embedding checks directly in streaming ingestion can increase complexity and overhead. Side outputs are good for error handling but can still impact streaming performance. **C (Third-party tool in real-time ingestion):** While real-time validation can catch errors early, it can introduce significant processing overhead on the ingestion pipeline, impacting performance. **D (Declarative rules with Delta Lake constraints at ingestion):** Delta Lake supports constraints, which help prevent bad data. However, enforcing constraints at ingestion can slow down ingestion and cause failures, potentially affecting performance if data volumes are high. Not as flexible or comprehensive for cleansing, and can increase latency.
Author: LeetQuiz Editorial Team
When designing a data quality framework within a lakehouse architecture, which approach ensures comprehensive data validation, error handling, and cleansing without introducing significant processing overhead?
A
Develop custom Spark jobs that run periodically to validate data quality, applying fixes in batches to reduce impact on primary workloads.
B
Embed data quality checks into streaming ingestion processes, using side outputs to capture and manage errors.
C
Integrate a third-party data quality tool directly into the data ingestion pipeline, performing real-time validation and cleansing.
D
Utilize declarative data quality rules within Delta Lake, enforcing constraints at the point of ingestion.
No comments yet.