
Ultimate access to all questions.
A data engineer has ingested data from an external source into a PySpark DataFrame raw_df. They need to briefly make this data available in SQL for a data analyst to perform a quality assurance check on the data.
Which of the following commands should the data engineer run to make this data available in SQL for only the remainder of the Spark session?
A
raw_df.createOrReplaceTempView("raw_df")
B
raw_df.createTable("raw_df")
C
raw_df.write.save("raw_df")
D
raw_df.saveAsTable("raw_df")
Explanation:
The correct answer is A because:
createOrReplaceTempView() creates a temporary view that is only available for the duration of the current Spark session. This matches the requirement to make data available "for only the remainder of the Spark session."
createTable() (Option B) is not a valid DataFrame method in PySpark. The correct method for creating permanent tables is createOrReplaceTempView() for temporary views or saveAsTable() for permanent tables.
write.save() (Option C) saves the DataFrame to a file storage location (like Parquet, CSV, etc.) but does not register it as a table/view in the Spark SQL catalog.
saveAsTable() (Option D) creates a permanent table in the Hive metastore that persists beyond the current Spark session, which contradicts the requirement for temporary availability.
Key Points:
createOrReplaceTempView() are automatically dropped when the Spark session endsSELECT * FROM raw_df