
Ultimate access to all questions.
A data analyst has created a Delta table sales that is used by the entire data analysis team. They want help from the data engineering team to implement a series of tests to ensure the data is clean. However, the data engineering team uses Python for its tests rather than SQL. Which of the following commands could the data engineering team use to access sales in PySpark?
A
SELECT * FROM sales
B
There is no way to share data between Pyspark and SQL.
C
spark.sql("sales")
D
spark.delta.table("sales")
E
spark.table("sales")
Explanation:
In PySpark, there are multiple ways to access Delta tables:
spark.table("sales") - This is the correct method to access a registered table in PySpark. When a Delta table is created, it's typically registered in the Spark catalog, and spark.table() can be used to access it.
spark.sql("SELECT * FROM sales") - While option A shows SELECT * FROM sales alone, in PySpark you would need to wrap it in spark.sql() to execute SQL queries.
spark.read.table("sales") - Another valid method for reading tables.
spark.read.format("delta").table("sales") - For explicitly specifying Delta format.
Why other options are incorrect:
SELECT * FROM sales alone is not valid PySpark syntax - it needs to be wrapped in spark.sql()spark.sql("sales") is incorrect syntax - spark.sql() expects a SQL query, not just a table namespark.delta.table("sales") is not valid PySpark syntax - there's no spark.delta.table() methodThe most straightforward and correct way to access a registered Delta table in PySpark is spark.table("sales").