
Ultimate access to all questions.
Answer-first summary for fast verification
Answer: spark.table("sales"), spark.delta.table("sales")
## Explanation The correct answer is **E. spark.table("sales")**. ### Why Option E is correct: - `spark.table("sales")` is the standard PySpark method to access tables registered in the Spark catalog. - This method returns a DataFrame that can be used for data processing, transformations, and testing in PySpark. - It works with Delta tables as well as other table types registered in the Spark catalog. ### Why Option D is also valid: - **D. spark.delta.table("sales")** is a valid alternative specifically for Delta tables. - This method is provided by the Delta Lake library and is optimized for Delta table operations. - However, `spark.table("sales")` is more commonly used and works for all table types. ### Why other options are incorrect: - **A. SELECT * FROM sales**: This is SQL syntax, not PySpark code. While you could use `spark.sql("SELECT * FROM sales")`, the option as written is not valid PySpark code. - **B. There is no way to share data between PySpark and SQL**: This is incorrect. PySpark and SQL can share data through the Spark catalog where tables are registered. - **C. spark.sql("sales")**: This is not a valid SQL query. `spark.sql()` expects a complete SQL statement, not just a table name. ### Key Points: 1. **PySpark-SQL Integration**: PySpark and SQL share the same Spark catalog, allowing seamless data access between both environments. 2. **Delta Table Access**: Delta tables registered in the catalog can be accessed using standard PySpark methods. 3. **Best Practice**: `spark.table("table_name")` is the most common and recommended way to access registered tables in PySpark.
Author: Keng Suppaseth
No comments yet.
A data analyst has created a Delta table sales that is used by the entire data analysis team. They want help from the data engineering team to implement a series of tests to ensure the data is clean. However, the data engineering team uses Python for its tests rather than SQL. Which of the following commands could the data engineering team use to access sales in PySpark?
A
SELECT * FROM sales
B
There is no way to share data between PySpark and SQL.
C
spark.sql("sales")
D
spark.delta.table("sales")
E
spark.table("sales")