
Answer-first summary for fast verification
Answer: Use Azure Databricks to read the data from the sources, perform data quality checks using its built-in functions, and clean the data before writing it to the Delta Lake.
Option B is the correct approach as it leverages Azure Databricks for reading, cleaning, and writing data to a Delta Lake. Azure Databricks provides built-in functions for data quality checks, such as identifying missing values, duplicate records, and data inconsistencies. This allows for efficient handling of data quality issues before writing the data to the Delta Lake. Options A, C, and D do not provide the same level of control and flexibility for handling data quality issues in the context of batch processing.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
You are working on a batch processing solution that needs to handle data from multiple sources with varying data quality issues. The solution should be able to identify and handle data quality issues, such as missing values, duplicate records, and data inconsistencies. How would you approach this task?
A
Use Azure Data Factory to orchestrate the data flow and use its data quality features to identify and handle data quality issues.
B
Use Azure Databricks to read the data from the sources, perform data quality checks using its built-in functions, and clean the data before writing it to the Delta Lake.
C
Use Azure Stream Analytics to process the data in real-time and handle data quality issues using its built-in functions.
D
Use Azure Functions to process the data in small batches and handle data quality issues using custom code.
No comments yet.