
Ultimate access to all questions.
A data analyst has created a Delta table sales that is used by the entire data analysis team. They want help from the data engineering team to implement a series of tests to ensure the data is clean. However, the data engineering team uses Python for its tests rather than SQL. Which of the following commands could the data engineering team use to access sales in PySpark?
A
SELECT * FROM sales
B
There is no way to share data between PySpark and SQL.
C
spark.sql("sales")
D
spark.delta.table("sales")
E
spark.table("sales")
Explanation:
In PySpark, there are multiple ways to access Delta tables created in SQL:
Option A: SELECT * FROM sales - This is SQL syntax, not PySpark Python syntax. While you could use spark.sql("SELECT * FROM sales"), the option as written is not valid PySpark Python code.
Option B: There is no way to share data between PySpark and SQL - This is incorrect. Delta tables created in SQL are automatically accessible in PySpark and vice versa because they are stored in the same metastore.
Option C: spark.sql("sales") - This is incorrect syntax. spark.sql() expects a SQL query string, not just a table name.
Option D: spark.delta.table("sales") - This is not a valid PySpark method. The correct way to read Delta tables is spark.read.table() or spark.table().
Option E: spark.table("sales") - CORRECT. This is the standard way to access a table in PySpark. It returns a DataFrame that references the Delta table sales.
Additional valid methods include:
spark.read.table("sales")spark.sql("SELECT * FROM sales")Delta tables created in SQL are automatically registered in the Spark session's catalog and can be accessed from PySpark using spark.table("table_name") or spark.read.table("table_name"). This interoperability is one of the key features of Databricks and Delta Lake.