
Ultimate access to all questions.
A data analyst has created a Delta table named 'sales' that serves the entire data analysis team. To ensure the data quality, the analyst seeks assistance from the data engineering team to implement a series of validation tests. However, the data engineering team prefers using Python for these tests instead of SQL. Which of the following commands could the data engineering team use to access the 'sales' Delta table in PySpark?
A
SELECT * FROM sales*
B
There is no way to share data between PySpark and SQL.
C
spark.sql("sales")
D
spark.delta.table("sales")
E
spark.table("sales")
Explanation:
The correct answer is E. The spark.table() function in PySpark allows you to access tables registered in the catalog, including Delta tables. By specifying the table name ('sales'), the data engineering team can read the Delta table and perform various operations on it using PySpark. Option A (SELECT * FROM sales) is SQL syntax and cannot be directly used in PySpark. Option B is incorrect because PySpark provides the capability to interact with data using both SQL and DataFrame/DataSet APIs. Option C (spark.sql('sales')) is not valid SQL syntax. Option D (spark.delta.table('sales')) does not exist in PySpark; the correct method is spark.table('sales').*