
Ultimate access to all questions.
Answer-first summary for fast verification
Answer: `spark.sql`
## Explanation The correct answer is **B. `spark.sql`**. ### Why this is correct: 1. **`spark.sql()`** is the standard method in PySpark for executing SQL queries in Databricks. 2. It allows you to pass SQL query strings as arguments, including formatted strings with Python variables. 3. The syntax `spark.sql(f"SELECT customer_id, spend FROM {table_name}")` will properly execute the SQL query with the table name variable substituted. ### Why other options are incorrect: - **A. `spark.delta.sql`**: This method doesn't exist in PySpark. Delta Lake operations are typically done through `spark.sql()` or specific DeltaTable APIs. - **C. `spark.table`**: This method is used to create a DataFrame from a table name (e.g., `spark.table(table_name)`), not to execute SQL queries. - **D. `dbutils.sql`**: `dbutils` is a Databricks utility module, but `dbutils.sql` doesn't exist. SQL execution is handled through `spark.sql()`. ### Key Points: - In Databricks notebooks, you can seamlessly mix Python and SQL using `spark.sql()` - The method supports Python f-strings for dynamic query construction - Results are returned as DataFrames that can be further processed in Python
Author: Keng Suppaseth
No comments yet.
A data engineer has a Python variable table_name that they would like to use in a SQL query. They want to construct a Python code block that will run the query using table_name. They have the following incomplete code block:
____(f"SELECT customer_id, spend FROM {table_name}")
____(f"SELECT customer_id, spend FROM {table_name}")
What can be used to fill in the blank to successfully complete the task?
A
spark.delta.sql
B
spark.sql
C
spark.table
D
dbutils.sql