
Answer-first summary for fast verification
Answer: Conduct a data quality assessment on the source data post-extraction. This assessment should include checks for value ranges, value distributions, counts of invalid and missing values, among other source data checks.
The correct approach is to perform a data quality assessment on data extracted from the source system. This method allows for a consistent evaluation across all data sources, unlike relying on source systems which may provide varied reports. Loading data directly from a data lake to a data warehouse does not offer an assessment of the problem's scope. Similarly, importing data into the data warehouse and logging failures is less efficient as it lacks aggregate statistics on the issue's full extent. For more details, refer to [Google Cloud's blog on data governance principles](https://cloud.google.com/blog/products/data-analytics/principles-and-best-practices-for-data-governance-in-the-cloud).
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
A data warehouse team is worried about the potential for poor quality controls in some data sources and wishes to avoid importing incorrect or invalid data into the data warehouse. What initial step could they take to assess the extent of the issue before developing ETL code?
A
Load all source data into a data lake and then proceed to load it into the data warehouse.
B
Request that administrators of the source systems generate a data quality verification prior to exporting the data.
C
Conduct a data quality assessment on the source data post-extraction. This assessment should include checks for value ranges, value distributions, counts of invalid and missing values, among other source data checks.
D
Import the data directly into the data warehouse and record any records that do not pass integrity or consistency checks.
No comments yet.