
Answer-first summary for fast verification
Answer: Create a data pipeline with scheduled runs, apply necessary transformations and validations, partition data by date, and store in a structured format.
Option B is the most efficient and scalable approach. It involves creating a data pipeline that can handle scheduled runs, apply necessary transformations and validations, and partition data by date to optimize storage and query performance.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
You are tasked with ingesting a large dataset from an external API into your lakehouse. The dataset is expected to grow significantly over time. Describe the steps you would take to ensure efficient data ingestion using a data pipeline. Include considerations for data validation, transformation, and storage optimization.
A
Use a simple ETL process without transformations, store data in raw format.
B
Create a data pipeline with scheduled runs, apply necessary transformations and validations, partition data by date, and store in a structured format.
C
Manually download and upload data periodically, perform no transformations.
D
Ingest data without scheduling, store in a single large file.
No comments yet.