Databricks Certified Data Engineer - Associate

Get started today

Ultimate access to all questions.

Deep dive into the quiz with AI chat providers.

We prepare a focused prompt with your quiz and certificate details so each AI can offer a more tailored, in-depth explanation.

A data analyst has developed a query that runs against Delta table. They want help from the data engineering team to implement a series of tests to ensure the data returned by the query is clean. However, the data engineering team uses Python for its tests rather than SQL.

Which of the following operations could the data engineering team use to run the query and operate with the results in PySpark?

Real Exam

Community

KKeng

Last updated: January 13, 2026 at 09:03

SELECT * FROM sales

spark.delta.table

spark.sql

There is no way to share data between PySpark and SQL.

spark.table

Explanation:

Explanation

In PySpark, there are multiple ways to execute SQL queries and work with the results:

Correct Options:

C. spark.sql - This is the primary method to execute SQL queries in PySpark. It returns a DataFrame that can be used for further data processing and testing.
E. spark.table - This method can be used to reference a table that has been registered in the Spark session, which could include the results of a query if it's saved as a temporary view.

Incorrect Options:

A. SELECT * FROM sales - This is just a SQL query string, not a PySpark operation. It needs to be wrapped in spark.sql() to execute.
B. spark.delta.table - This is not a valid PySpark method. The correct way to work with Delta tables is spark.read.format("delta").table() or spark.table() if the table is registered.
D. There is no way to share data between PySpark and SQL - This is incorrect. PySpark and SQL can share data through DataFrames, temporary views, and Delta tables.

How to implement the solution:

# Method 1: Using spark.sql()
query = "SELECT * FROM sales WHERE amount > 1000"
df = spark.sql(query)

# Now you can run tests on the DataFrame
# Example test: check for null values
null_count = df.filter(df.amount.isNull()).count()
assert null_count == 0, f"Found {null_count} null values in amount column"

# Method 2: Using spark.table() if the query result is saved as a view
spark.sql("CREATE OR REPLACE TEMP VIEW clean_sales AS SELECT * FROM sales WHERE amount > 1000")
df = spark.table("clean_sales")

# Method 1: Using spark.sql()
query = "SELECT * FROM sales WHERE amount > 1000"
df = spark.sql(query)

# Now you can run tests on the DataFrame
# Example test: check for null values
null_count = df.filter(df.amount.isNull()).count()
assert null_count == 0, f"Found {null_count} null values in amount column"

# Method 2: Using spark.table() if the query result is saved as a view
spark.sql("CREATE OR REPLACE TEMP VIEW clean_sales AS SELECT * FROM sales WHERE amount > 1000")
df = spark.table("clean_sales")

Both spark.sql() and spark.table() allow the data engineering team to work with SQL query results in PySpark for implementing data quality tests.

Powered ByGPT-5.2

Comments

Loading comments...