
Explanation:
The correct answer is B: JSON data is a text-based format.
Databricks Auto Loader handles schema inference differently depending on the source file format. For formats that encode data types natively (such as Parquet or Avro), Auto Loader can read and preserve the original types (e.g., float, boolean, integer, struct, etc.).
However, JSON (like CSV and XML) is a text-based, schemaless format that does not embed type information in the file itself. Numbers, booleans, and nulls appear as plain text. Because of this, when no explicit schema or type hints are provided, Auto Loader's default behavior for JSON is to infer all columns as string (including nested fields).
This is explicitly documented:
"For formats that don't encode data types (JSON, CSV, and XML), Auto Loader infers all columns as strings (including nested fields in JSON files)."
The official table in the Databricks documentation confirms:
A: Auto Loader cannot infer the schema of ingested data — Incorrect. Auto Loader does perform schema inference for JSON (it just defaults to string for safety and to simplify schema evolution in streaming/ingestion scenarios). You can make it infer more precise types by enabling .option("cloudFiles.inferColumnTypes", "true") (though this is off by default and has some trade-offs).
C: Auto Loader only works with string data — False. Auto Loader supports many formats and types; the string default is specific to text-based formats without embedded type metadata.
D: All of the fields had at least one null value — Incorrect. Nulls are common in JSON and do not force string inference. The root cause is the text-based nature of JSON, not the presence of nulls. Schema sampling or mixed types in a column can influence inference, but the default is string regardless.
float, boolean, struct): Use the option cloudFiles.inferColumnTypes = true.cloudFiles.schemaHints (e.g., "my_float_col FLOAT, my_bool_col BOOLEAN")..schema(...) or use from_json + schema for nested JSON parsing after the initial load.This question tests your understanding of how Auto Loader treats text-based vs. binary/typed formats during schema inference — a common topic in the Databricks Certified Data Engineer Associate exam.
Ultimate access to all questions.
A data engineer has developed a data pipeline to ingest data from a JSON source using Auto Loader, but the engineer has not provided any type inference or schema hints in their pipeline. Upon reviewing the data, the data engineer has noticed that all of the columns in the target table are of the string type despite some of the fields only including float or boolean values.
Why has Auto Loader inferred all of the columns to be of the string type?
A
Auto Loader cannot infer the schema of ingested data
B
JSON data is a text-based format
C
Auto Loader only works with string data
D
All of the fields had at least one null value