
Answer-first summary for fast verification
Answer: Adopt TensorFlow Data Validation to automatically detect and flag schema abnormalities, providing a scalable solution for ensuring data integrity against formatting changes.
TensorFlow Data Validation (TFDV) is the most suitable choice for this scenario because it specializes in detecting and flagging schema anomalies in data, thereby enhancing the robustness of the model training pipeline against unreliable data formatting changes. TFDV offers comprehensive data verification and validation capabilities, including statistical analysis, data type and shape inference, and outlier detection, without the need for manual updates or masking data issues. This approach ensures models are trained on accurate and reliable data, aligning with the requirement for a cost-effective and scalable solution.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
In the context of developing a machine learning model, your team relies on a third-party data broker for training data. However, notifications about data formatting changes from the broker are unreliable, potentially compromising the robustness of your model training pipeline. Considering the need for a solution that ensures data integrity without significantly increasing operational costs, which of the following approaches would you implement? (Choose one correct option)
A
Implement custom TensorFlow functions at the pipeline's start to manually detect and flag known formatting errors, requiring continuous updates as new errors are identified.
B
Integrate TensorFlow Transform to preprocess data, normalizing it to an expected distribution and replacing non-conforming values with zeros, which may mask underlying data issues.
C
Utilize tf.math for data analysis to calculate summary statistics and identify statistical anomalies, though this may not directly address schema inconsistencies.
D
Adopt TensorFlow Data Validation to automatically detect and flag schema abnormalities, providing a scalable solution for ensuring data integrity against formatting changes.
No comments yet.