
Ultimate access to all questions.
When implementing a data pipeline that requires updates, deletions, and merges into a large dataset stored on a distributed file system, what feature should a data engineer use to ensure these operations are performed reliably and consistently, with the ability to handle concurrent modifications without data corruption or loss?
A
Spark Streaming
B
Delta Lake
C
Dataframe caching
D
File compaction
E
Data partitioning