
Ultimate access to all questions.
A data engineer is designing the schema for a Silver-layer table, silver_device_recordings, which processes highly nested JSON data containing 100 unique fields. Only 45 of these fields are required for downstream production models and dashboards. Given the complexity and volume of the fields, which of the following statements is most relevant to the engineer's decision on whether to use schema inference or manual declaration?
A
Since Delta Lake uses Parquet storage, data types can be evolved and modified by directly editing the file footer metadata.
B
Databricks' Tungsten engine is optimized for string data, making the use of string types for all JSON fields the most computationally efficient approach.
C
Schema inference and evolution automatically ensure that the resulting data types will always align with the requirements of downstream consumers.
D
Databricks' schema inference selects types broad enough to accommodate all observed data, so manual schema definition provides superior data quality assurance and stricter typing.