Ultimate access to all questions.
A data engineer has prepared a notebook to be scheduled as part of a data pipeline. The following commands produce correct results when executed as shown:
Cmd 1:
rawDF = spark.table("raw_data")
Cmd 2:
rawDF.printSchema()
Cmd 3:
flattenedDF = rawDF.select("*", "values.*")
Cmd 4:
finalDF = flattenedDF.drop("values")
Cmd 5:
display(finalDF)
Cmd 6:
finalDF.write.mode("append").saveAsTable("flat_data")
Which command should be excluded from the notebook before scheduling it as a job?