
Ultimate access to all questions.
A data engineer has developed a data pipeline to ingest data from a JSON source using Auto Loader, but the engineer has not provided any type inference or schema hints in their pipeline. Upon reviewing the data, the data engineer has noticed that all of the columns in the target table are of the string type despite some of the fields only including float or boolean values.
Which of the following describes why Auto Loader inferred all of the columns to be of the string type?
A
There was a type mismatch between the specific schema and the inferred schema
B
JSON data is a text-based format
C
Auto Loader only works with string data
D
All of the fields had at least one null value
E
Auto Loader cannot infer the schema of ingested data
Explanation:
Auto Loader infers all columns as string type when ingesting JSON data because:
JSON is inherently text-based: JSON (JavaScript Object Notation) is a text format that represents data as strings. Even when JSON contains numeric values like 42 or boolean values like true, they are fundamentally represented as text in the JSON file.
Auto Loader's default behavior: When no schema hints or type inference options are provided, Auto Loader treats all JSON data as strings to ensure data ingestion doesn't fail due to type mismatches. This is a safe default approach.
Why other options are incorrect:
Solution: To properly infer types, the data engineer should use schema hints or enable type inference options like cloudFiles.schemaEvolutionMode or provide an explicit schema.