
Answer-first summary for fast verification
Answer: Because Databricks will infer schema using types that allow all observed data to be processed, setting types manually provides greater assurance of data quality enforcement.
The question assesses knowledge of Delta Lake and Databricks schema handling. Option D is correct because manually setting schema ensures data quality enforcement. When Databricks infers schema, it uses permissive types to avoid errors during ingestion, which might not align with strict downstream requirements. Manual schema declaration enforces stricter type checks, preventing invalid data. Other options are incorrect: A incorrectly claims Tungsten optimizes for strings, B misrepresents Parquet's immutability, C is a general statement unrelated to schema features, and E falsely claims inferred types always match downstream systems.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
A junior data engineer is implementing logic for a Lakehouse table called silver_device_recordings. The source data consists of 100 unique fields in a deeply nested JSON structure.
The silver_device_recordings table will serve downstream applications, including multiple production monitoring dashboards and a production model. Currently, 45 out of the 100 fields are utilized in at least one of these applications.
Given the highly nested schema and large number of fields, the data engineer must decide on the optimal approach for schema declaration.
Which of the following statements correctly describes Delta Lake and Databricks features that could influence this decision?
A
The Tungsten encoding used by Databricks is optimized for storing string data; newly-added native support for querying JSON strings means that string types are always most efficient.
B
Because Delta Lake uses Parquet for data storage, data types can be easily evolved by just modifying file footer information in place.
C
Human labor in writing code is the largest cost associated with data engineering workloads; as such, automating table declaration logic should be a priority in all migration workloads.
D
Because Databricks will infer schema using types that allow all observed data to be processed, setting types manually provides greater assurance of data quality enforcement.
E
Schema inference and evolution on Databricks ensure that inferred types will always accurately match the data types used by downstream systems.