
Answer-first summary for fast verification
Answer: (spark.readStream .format("cloudFiles") .option("cloudFiles.format", "json") .option("cloudFiles.schemaHints", "time TIMESTAMP") .option("cloudFiles.schemaLocation", "dbfs:/mnt/datalake/bronze/checkpoint") .load("dbfs:/mnt/datalake/bronze/recordings") .createOrReplaceTempView("recordings_raw_temp"))
Auto Loader schema inference works by sampling the first 50 GB or 1000 files, whichever comes first, to infer the schema. For formats like JSON and CSV, it infers all columns as strings by default to avoid schema evolution issues. However, you can override this behavior using schema hints to enforce specific data types. The correct option uses `cloudFiles.schemaHints` to specify that the 'time' column should be treated as a TIMESTAMP, ensuring the data is correctly typed when ingested.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
A data engineer needs to ingest heart rate recordings data from medical devices in JSON format using Auto Loader and then create a temporary view. The column 'time' is incorrectly inferred as a 'string' instead of a 'timestamp' data type. Which of the following code blocks correctly enforces the schema information for this column?
A
(spark.readStream .format("cloudFiles") .option("cloudFiles.format", "json") .option("cloudFiles.enforceSchema", "time TIMESTAMP") .option("cloudFiles.schemaLocation", "dbfs:/mnt/datalake/bronze/checkpoint") .load("dbfs:/mnt/datalake/bronze/recordings") .createOrReplaceTempView("recordings_raw_temp"))
B
(spark.readStream .format("cloudFiles") .option("cloudFiles.format", "json") .option("cloudFiles.inferColumnTypes", "time TIMESTAMP") .option("cloudFiles.schemaLocation", "dbfs:/mnt/datalake/bronze/checkpoint") .load("dbfs:/mnt/datalake/bronze/recordings") .createOrReplaceTempView("recordings_raw_temp"))
C
(spark.readStream .format("cloudFiles") .option("cloudFiles.format", "json") .option("cloudFiles.schemaHints", "time TIMESTAMP") .option("cloudFiles.schemaLocation", "dbfs:/mnt/datalake/bronze/checkpoint") .load("dbfs:/mnt/datalake/bronze/recordings") .createOrReplaceTempView("recordings_raw_temp"))
D
(spark.readStream .format("cloudFiles") .option("cloudFiles.format", "json") .option("cloudFiles.schemaDetails", "time TIMESTAMP") .option("cloudFiles.schemaLocation", "dbfs:/mnt/datalake/bronze/checkpoint") .load("dbfs:/mnt/datalake/bronze/recordings") .createOrReplaceTempView("recordings_raw_temp"))
E
(spark.readStream .format("cloudFiles") .option("cloudFiles.format", "json") .option("cloudFiles.schemaHint", "time TIMESTAMP") .option("cloudFiles.schemaLocation", "dbfs:/mnt/datalake/bronze/checkpoint") .load("dbfs:/mnt/datalake/bronze/recordings") .createOrReplaceTempView("recordings_raw_temp"))