
Ultimate access to all questions.
Question 23
A data engineer has ingested data from an external source into a PySpark DataFrame raw_df. They need to briefly make this data available in SQL for a data analyst to perform a quality assurance check on the data.
Which of the following commands should the data engineer run to make this data available in SQL for only the remainder of the Spark session?_
Explanation:
The correct answer is A because:
createOrReplaceTempView("raw_df") creates a temporary view that is only available for the duration of the current Spark session, which perfectly matches the requirement to "briefly make this data available in SQL for only the remainder of the Spark session"SELECT * FROM raw_dfWhy the other options are incorrect:
createTable("raw_df"): This method doesn't exist in PySpark DataFrame APIwrite.save("raw_df"): This saves the DataFrame to a file system location but doesn't make it available as a SQL table/viewsaveAsTable("raw_df"): This creates a permanent table in the Hive metastore that persists beyond the current Spark session, which contradicts the requirement for temporary availabilityThe key distinction is that temporary views (createOrReplaceTempView) are session-scoped while tables created with saveAsTable are persistent and survive session restarts.*_