Databricks Certified Data Engineer - Associate

Get started today

Ultimate access to all questions.

Deep dive into the quiz with AI chat providers.

We prepare a focused prompt with your quiz and certificate details so each AI can offer a more tailored, in-depth explanation.

A data engineer has developed a data pipeline to ingest data from a JSON source using Auto Loader, but the engineer has not provided any type inference or schema hints in their pipeline. Upon reviewing the data, the data engineer has noticed that all of the columns in the target table are of the string type despite some of the fields only including float or boolean values.

Why has Auto Loader inferred all of the columns to be of the string type?

Real Exam

Community

KKeng

Last updated: January 13, 2026 at 09:15

Auto Loader cannot infer the schema of ingested data

JSON data is a text-based format

Auto Loader only works with string data

All of the fields had at least one null value

Explanation:

Explanation

Auto Loader infers all columns as string type when ingesting JSON data because:

JSON is inherently text-based: JSON (JavaScript Object Notation) is a text format for data interchange. All values in JSON are represented as text strings, even when they represent numbers, booleans, or other data types.
Schema inference behavior: When Auto Loader processes JSON files without explicit schema hints:
- It reads the JSON as text first
- Without explicit type information, it defaults to string type for all columns
- This is a conservative approach to avoid data loss or parsing errors
Why other options are incorrect:
- A: Auto Loader CAN infer schema, but it needs sufficient data samples or explicit hints
- C: Auto Loader works with various data types, not just strings
- D: Null values alone don't cause all columns to be strings; Auto Loader can infer types from non-null values
Best practice: To get proper type inference with JSON data:
- Provide explicit schema hints using cloudFiles.schemaHints
- Use cloudFiles.schemaEvolutionMode to control schema evolution
- Or specify a schema upfront using schema option

This behavior ensures data integrity by avoiding potential type conversion errors during initial ingestion.

Powered ByGPT-5.2

Comments

Loading comments...