
Ultimate access to all questions.
A data analyst has created a Delta table sales that is used by the entire data analysis team. They want help from the data engineering team to implement a series of tests to ensure the data is clean. However, the data engineering team uses Python for its tests rather than SQL. Which of the following commands could the data engineering team use to access sales in PySpark?
A
SELECT * FROM sales
B
There is no way to share data between PySpark and SQL.
C
spark.sql("sales")
D
spark.delta.table("sales")
E
spark.table("sales")
Explanation:
The correct answer is E. spark.table("sales").
spark.table("sales") is the standard PySpark method to access tables registered in the Spark catalog.spark.table("sales") is more commonly used and works for all table types.spark.sql("SELECT * FROM sales"), the option as written is not valid PySpark code.spark.sql() expects a complete SQL statement, not just a table name.spark.table("table_name") is the most common and recommended way to access registered tables in PySpark.