
Ultimate access to all questions.
A data engineer has a Python variable table_name that they would like to use in a SQL query. They want to construct a Python code block that will run the query using table_name.
They have the following incomplete code block:
_____(f"SELECT customer_id, spend FROM {table_name}")
_____(f"SELECT customer_id, spend FROM {table_name}")
What can be used to fill in the blank to successfully complete the task?
A
spark.delta.sql
B
spark.sql
C
spark.table
D
dbutils.sql
Explanation:
In PySpark, spark.sql() is the method used to execute SQL queries. The placeholder _____(f"SELECT ...") expects a function call that executes the SQL string — which is exactly what spark.sql() does.
Example:
df = spark.sql(f"SELECT customer_id, spend FROM {table_name}")
df = spark.sql(f"SELECT customer_id, spend FROM {table_name}")
The other options are incorrect:
spark.delta.sql → Not a valid method.spark.table → Used to retrieve a DataFrame from a registered table (not for executing arbitrary SQL strings).dbutils.sql → Not a standard Spark/PySpark method; dbutils is typically used for Databricks-specific utilities like file operations or notebook commands.