
Answer-first summary for fast verification
Answer: `microBatchDF._jdf.sparkSession().sql(sql_query)`
In Databricks, to execute SQL queries, you can utilize `sparkSession`. Specifically, within the `foreachBatch` function context, the correct approach is to use `microBatchDF._jdf.sparkSession().sql(sql_query)`` for executing a SQL query on the `sparkSession` of the micro-batch DataFrame with Databricks Runtime below 10.5. This method ensures the `sql_query` is executed against the Spark session linked to the batch of data being processed. Reference: [Databricks Documentation](https://docs.gcp.databricks.com/structured-streaming/delta-lake.html#language-python)
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
A data engineer is utilizing a foreachBatch logic to upsert data into a target Delta table. The function to be invoked at each new microbatch processing is shown below with a blank:
def upsert_data(microBatchDF, batch_id):
microBatchDF.createOrReplaceTempView("sales_microbatch")
sql_query = """
MERGE INTO sales_silver a
USING sales_microbatch b
ON a.item_id=b.item_id
AND a.item_timestamp=b.item_timestamp
WHEN NOT MATCHED THEN INSERT *
"""
________________
def upsert_data(microBatchDF, batch_id):
microBatchDF.createOrReplaceTempView("sales_microbatch")
sql_query = """
MERGE INTO sales_silver a
USING sales_microbatch b
ON a.item_id=b.item_id
AND a.item_timestamp=b.item_timestamp
WHEN NOT MATCHED THEN INSERT *
"""
________________
Which option correctly fills in the blank to execute the SQL query in the function on a cluster with Databricks Runtime below 10.5?
A
spark.sql(sql_query)
B
batch_id.sql(sql_query)
C
microBatchDF.sql(sql_query)
D
microBatchDF.sparkSession.sql(sql_query)
E
microBatchDF._jdf.sparkSession().sql(sql_query)
No comments yet.