
Ultimate access to all questions.
A data engineer has developed a data pipeline to ingest data from a JSON source using Auto Loader, but the engineer has not provided any type inference or schema hints in their pipeline. Upon reviewing the data, the data engineer has noticed that all of the columns in the target table are of the string type despite some of the fields only including float or boolean values.
Why has Auto Loader inferred all of the columns to be of the string type?
A
Auto Loader cannot infer the schema of ingested data
B
JSON data is a text-based format
C
Auto Loader only works with string data
D
All of the fields had at least one null value
Explanation:
Auto Loader infers all columns as string type when ingesting JSON data because:
JSON is inherently text-based: JSON (JavaScript Object Notation) is a text format for data interchange. All values in JSON are represented as text strings, even when they represent numbers, booleans, or other data types.
Schema inference behavior: When Auto Loader processes JSON files without explicit schema hints:
Why other options are incorrect:
Best practice: To get proper type inference with JSON data:
cloudFiles.schemaHintscloudFiles.schemaEvolutionMode to control schema evolutionschema optionThis behavior ensures data integrity by avoiding potential type conversion errors during initial ingestion.