
Ultimate access to all questions.
A data analyst has created a Delta table sales that is used by the entire data analysis team. They want help from the data engineering team to implement a series of tests to ensure the data is clean. However, the data engineering team uses Python for its tests rather than SQL.
Which command could the data engineering team use to access sales in PySpark?
A
SELECT * FROM sales
B
spark.table("sales")
C
spark.sql("sales")
D
spark.delta.table("sales")
Explanation:
The correct answer is B. spark.table("sales").
Here's why:
spark.table("sales") is the standard PySpark method to access a table registered in the Spark catalog. This method returns a DataFrame representing the table, which can then be used for data processing, transformations, and testing in Python.
Option A (SELECT * FROM sales) is incorrect because this is SQL syntax, not PySpark Python code. While you could use spark.sql("SELECT * FROM sales") to execute SQL in PySpark, the option shows only the SQL statement without the spark.sql() wrapper.
Option C (spark.sql("sales")) is incorrect because spark.sql() expects a complete SQL query string, not just a table name. This would result in a syntax error.
Option D (spark.delta.table("sales")) is incorrect because there is no spark.delta.table() method in PySpark. While Databricks provides Delta Lake functionality, the correct way to access a Delta table is through spark.table() or spark.read.format("delta").table("sales").
Additional context:
spark.table("table_name").