
Answer-first summary for fast verification
Answer: Use automated testing frameworks to simulate a wide range of anomalies, including data skew and late-arriving data, and integrate these tests into a CI/CD pipeline.
Option B provides a comprehensive testing approach. Automated testing frameworks allow for the simulation of a wide range of anomalies, which is essential for ensuring the robustness of the data pipelines. Integrating these tests into a CI/CD pipeline ensures that they are regularly executed, providing continuous validation of the pipeline's performance and reliability.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
In a stream processing application, you need to create tests for data pipelines to ensure they handle various data anomalies and edge cases. Describe how you would design these tests, including the types of anomalies you would simulate and the tools you would use to automate the testing process.
A
Simulate common anomalies like data duplication and missing values using manual scripts and perform tests manually.
B
Use automated testing frameworks to simulate a wide range of anomalies, including data skew and late-arriving data, and integrate these tests into a CI/CD pipeline.
C
Focus only on testing the data schema compliance and use static code analysis tools to ensure pipeline integrity.
D
Manually inject anomalies into the data stream and observe the pipeline's behavior in real-time.
No comments yet.