
Answer-first summary for fast verification
Answer: JSON data is a text-based format
## Explanation Auto Loader infers all columns as string type when ingesting JSON data because: 1. **JSON is inherently text-based**: JSON (JavaScript Object Notation) is a text format for data interchange. All values in JSON are represented as text strings, even when they represent numbers, booleans, or other data types. 2. **Schema inference behavior**: When Auto Loader processes JSON files without explicit schema hints: - It reads the JSON as text first - Without explicit type information, it defaults to string type for all columns - This is a conservative approach to avoid data loss or parsing errors 3. **Why other options are incorrect**: - **A**: Auto Loader CAN infer schema, but it needs sufficient data samples or explicit hints - **C**: Auto Loader works with various data types, not just strings - **D**: Null values alone don't cause all columns to be strings; Auto Loader can infer types from non-null values 4. **Best practice**: To get proper type inference with JSON data: - Provide explicit schema hints using `cloudFiles.schemaHints` - Use `cloudFiles.schemaEvolutionMode` to control schema evolution - Or specify a schema upfront using `schema` option This behavior ensures data integrity by avoiding potential type conversion errors during initial ingestion.
Author: Keng Suppaseth
Ultimate access to all questions.
A data engineer has developed a data pipeline to ingest data from a JSON source using Auto Loader, but the engineer has not provided any type inference or schema hints in their pipeline. Upon reviewing the data, the data engineer has noticed that all of the columns in the target table are of the string type despite some of the fields only including float or boolean values.
Why has Auto Loader inferred all of the columns to be of the string type?
A
Auto Loader cannot infer the schema of ingested data
B
JSON data is a text-based format
C
Auto Loader only works with string data
D
All of the fields had at least one null value
No comments yet.