Databricks Certified Data Engineer - Associate

Get started today

Ultimate access to all questions.

Deep dive into the quiz with AI chat providers.

We prepare a focused prompt with your quiz and certificate details so each AI can offer a more tailored, in-depth explanation.

A data analyst has created a Delta table sales that is used by the entire data analysis team. They want help from the data engineering team to implement a series of tests to ensure the data is clean. However, the data engineering team uses Python for its tests rather than SQL.

Which command could the data engineering team use to access sales in PySpark?

Real Exam

Community

KKeng

Last updated: January 13, 2026 at 09:15

SELECT * FROM sales

spark.table("sales")

spark.sql("sales")

spark.delta.table("sales")

Explanation:

Explanation

The correct answer is B. spark.table("sales").

Here's why:

spark.table("sales") is the standard PySpark method to access a table registered in the Spark catalog. This method returns a DataFrame representing the table, which can then be used for data processing, transformations, and testing in Python.
Option A (SELECT * FROM sales) is incorrect because this is SQL syntax, not PySpark Python code. While you could use spark.sql("SELECT * FROM sales") to execute SQL in PySpark, the option shows only the SQL statement without the spark.sql() wrapper.
Option C (spark.sql("sales")) is incorrect because spark.sql() expects a complete SQL query string, not just a table name. This would result in a syntax error.
Option D (spark.delta.table("sales")) is incorrect because there is no spark.delta.table() method in PySpark. While Databricks provides Delta Lake functionality, the correct way to access a Delta table is through spark.table() or spark.read.format("delta").table("sales").

Additional context:

In PySpark, once a table is registered in the Spark catalog (which happens automatically when creating Delta tables in Databricks), it can be accessed using spark.table("table_name").
The data engineering team can use this DataFrame to implement data quality tests in Python using PySpark's DataFrame API.
This approach allows them to leverage Python's testing frameworks while working with the same Delta table that the analytics team uses.

Powered ByGPT-5.2

Comments

Loading comments...