Databricks Certified Data Engineer - Associate

Ultimate access to all questions.

In a data engineering project, the team is utilizing Auto Loader to ingest data from a JSON source into a Databricks environment. They observe that despite the JSON source containing a mix of data types including integers, booleans, and strings, Auto Loader is inferring all data as STRING. This has led to data processing inefficiencies and inaccuracies in downstream analytics. Considering the need for accurate data type inference to ensure data quality and processing efficiency, which of the following is the MOST LIKELY reason for this behavior and the BEST solution to resolve it? Choose one option.

Simulated

Auto Loader lacks the capability to infer data types from JSON sources, requiring manual data type specification for each field.

11.2%

The JSON source's data is malformed or lacks explicit type definitions, forcing Auto Loader to default all data to STRING type.

Loading comments...