Ultimate access to all questions.
Upgrade Now 🚀
Sign in to unlock AI tutor
Ensuring high data quality is essential for your processing pipeline. How can you automate data quality checks within your Apache Spark pipelines in Databricks to detect and separate corrupt or missing data?
A
Implementing Spark SQL window functions to compare records and identify anomalies
B
Developing a UDF (User Defined Function) that validates data against predefined rules and filters out invalid records
C
Leveraging Databricks‘ Expectation Framework to define and execute data quality constraints
D
Utilizing a separate Spark job that runs data quality reports and alerts the team on issues