
Ultimate access to all questions.
A data analyst has created a Delta table sales that is used by the entire data analysis team. They want help from the data engineering team to implement a series of tests to ensure the data is clean. However, the data engineering team uses Python for its tests rather than SQL.
Which command could the data engineering team use to access sales in PySpark?
A
SELECT * FROM sales
B
spark.table("sales")
C
spark.sql("sales")
D
spark.delta.table("sales")
Explanation:
The correct answer is B. spark.table("sales").
Here's why:
spark.table("sales") is the standard PySpark method to access a registered table in the Spark session. This method returns a DataFrame representing the table, which can then be used for data validation, testing, and analysis in Python.
Option A (SELECT * FROM sales) is incorrect because this is SQL syntax, not Python/PySpark syntax. While you could use spark.sql("SELECT * FROM sales"), the option shows only the SQL part without the PySpark wrapper.
Option C (spark.sql("sales")) is incorrect because spark.sql() expects a complete SQL query string, not just a table name. This would result in a syntax error.
Option D (spark.delta.table("sales")) is incorrect because while Delta tables can be accessed this way, it's not the standard method. The spark.table() method works for both Delta and non-Delta tables, making it more versatile and the recommended approach.
Key Points:
spark.table("table_name") is the standard PySpark API for accessing registered tables