
Ultimate access to all questions.
In a data engineering project, the team is utilizing Auto Loader to ingest data from a JSON source into a Databricks environment. They observe that despite the JSON source containing a mix of data types including integers, booleans, and strings, Auto Loader is inferring all data as STRING. This has led to data processing inefficiencies and inaccuracies in downstream analytics. Considering the need for accurate data type inference to ensure data quality and processing efficiency, which of the following is the MOST LIKELY reason for this behavior and the BEST solution to resolve it? Choose one option.
A
Auto Loader lacks the capability to infer data types from JSON sources, requiring manual data type specification for each field.
B
The JSON source's data is malformed or lacks explicit type definitions, forcing Auto Loader to default all data to STRING type.
C
Auto Loader's default setting is to infer all data as STRING to maximize compatibility across different data sources and formats.
D
The pipeline configuration does not include a schema definition, preventing Auto Loader from accurately inferring data types from the JSON source.