
Ultimate access to all questions.
In the context of developing a machine learning model, your team relies on a third-party data broker for training data. However, notifications about data formatting changes from the broker are unreliable, potentially compromising the robustness of your model training pipeline. Considering the need for a solution that ensures data integrity without significantly increasing operational costs, which of the following approaches would you implement? (Choose one correct option)
A
Implement custom TensorFlow functions at the pipeline's start to manually detect and flag known formatting errors, requiring continuous updates as new errors are identified.
B
Integrate TensorFlow Transform to preprocess data, normalizing it to an expected distribution and replacing non-conforming values with zeros, which may mask underlying data issues.
C
Utilize tf.math for data analysis to calculate summary statistics and identify statistical anomalies, though this may not directly address schema inconsistencies.
D
Adopt TensorFlow Data Validation to automatically detect and flag schema abnormalities, providing a scalable solution for ensuring data integrity against formatting changes.