Ultimate access to all questions.
Upgrade Now 🚀
Sign in to unlock AI tutor
When developing a data ingestion pipeline that consolidates data from various sources using Apache Spark, which method would you employ to ensure high data quality across ingested datasets?
A
Utilize Spark's built-in data frame functions to clean and validate data after ingestion.
B
Manually inspect a sample of the ingested data for quality issues.
C
Apply schema-on-read during data loading to enforce data types and nullability constraints.
D
Implement an external data quality tool to preprocess files before ingestion.