
Ultimate access to all questions.
You are tasked with ingesting a large dataset from an external API into your lakehouse. The dataset is expected to grow significantly over time. Describe the steps you would take to ensure efficient data ingestion using a data pipeline. Include considerations for data validation, transformation, and storage optimization.
A
Use a simple ETL process without transformations, store data in raw format.
B
Create a data pipeline with scheduled runs, apply necessary transformations and validations, partition data by date, and store in a structured format.
C
Manually download and upload data periodically, perform no transformations.
D
Ingest data without scheduling, store in a single large file.