Databricks Certified Data Engineer - Associate

Get started today

Ultimate access to all questions.

Deep dive into the quiz with AI chat providers.

We prepare a focused prompt with your quiz and certificate details so each AI can offer a more tailored, in-depth explanation.

A data engineer has developed a data pipeline to ingest data from a JSON source using Auto Loader, but the engineer has not provided any type inference or schema hints in their pipeline. Upon reviewing the data, the data engineer has noticed that all of the columns in the target table are of the string type despite some of the fields only including float or boolean values.

Why has Auto Loader inferred all of the columns to be of the string type?

Real Exam

Community

KKeng

Last updated: January 13, 2026 at 09:15

Auto Loader cannot infer the schema of ingested data

JSON data is a text-based format

Auto Loader only works with string data

All of the fields had at least one null value

Explanation:

Explanation

Auto Loader's schema inference works by sampling a subset of files to determine column types. When Auto Loader encounters null values in the sampled data for certain columns, it defaults to the string type for those columns as a safe fallback. This is because:

Null values don't provide type information - When Auto Loader samples files and finds null values in a column, it cannot determine the intended data type (boolean, integer, float, etc.) from null alone.
String is the most flexible type - String can accommodate any data format, so Auto Loader chooses string as the default to avoid data loss or parsing errors.
This is a known behavior - In Databricks Auto Loader, if the initial sample contains null values for certain fields, those columns will be inferred as string type rather than their actual intended types.

Why other options are incorrect:

A: Auto Loader can infer schema - it has schema inference capabilities
B: While JSON is text-based, Auto Loader can still infer numeric and boolean types from JSON data
C: Auto Loader works with various data types, not just strings

Solution: To fix this issue, the data engineer should:

Provide explicit schema hints using cloudFiles.schemaHints option
Increase the sample size for schema inference
Use cloudFiles.schemaEvolutionMode to control how schema changes are handled
Manually specify the schema using schema option if the data structure is known

Powered ByGPT-5.2

Comments

Loading comments...