
Answer-first summary for fast verification
Answer: `raw_df.createOrReplaceTempView("raw_df")`
## Explanation The correct answer is **A** because: - `createOrReplaceTempView("raw_df")` creates a temporary view that is only available for the duration of the current Spark session, which perfectly matches the requirement to "briefly make this data available in SQL for only the remainder of the Spark session" - Temporary views are automatically dropped when the Spark session ends - This allows data analysts to query the data using SQL syntax like `SELECT * FROM raw_df` **Why the other options are incorrect:** - **B** `createTable("raw_df")`: This method doesn't exist in PySpark DataFrame API - **C** `write.save("raw_df")`: This saves the DataFrame to a file system location but doesn't make it available as a SQL table/view - **D** `saveAsTable("raw_df")`: This creates a permanent table in the Hive metastore that persists beyond the current Spark session, which contradicts the requirement for temporary availability The key distinction is that temporary views (`createOrReplaceTempView`) are session-scoped while tables created with `saveAsTable` are persistent and survive session restarts.
Author: LeetQuiz .
Ultimate access to all questions.
Question 23
A data engineer has ingested data from an external source into a PySpark DataFrame raw_df. They need to briefly make this data available in SQL for a data analyst to perform a quality assurance check on the data.
Which of the following commands should the data engineer run to make this data available in SQL for only the remainder of the Spark session?
A
raw_df.createOrReplaceTempView("raw_df")
B
raw_df.createTable("raw_df")
C
raw_df.write.save("raw_df")
D
raw_df.saveAsTable("raw_df")
No comments yet.