Ultimate access to all questions.
Upgrade Now 🚀
Sign in to unlock AI tutor
In a scenario where you are tasked with optimizing a data lake for analytics, how would you approach ensuring data quality and handling data skew in a distributed storage system?
A
Use a centralized data validation tool to check for data completeness, consistency, accuracy, and integrity before ingestion into the data lake.
B
Distribute the data validation process across the nodes in the data lake to parallelize the process and handle large volumes of data efficiently.
C
Ignore data quality and skew issues, focusing only on the storage capacity and performance of the data lake.
D
Manually inspect a sample of the data to ensure quality and consistency before ingestion into the data lake.