
Explanation:
In Databricks, to execute SQL queries, you can utilize sparkSession. Specifically, within the foreachBatch function context, the correct approach is to use microBatchDF._jdf.sparkSession().sql(sql_query)`` for executing a SQL query on the sparkSessionof the micro-batch DataFrame with Databricks Runtime below 10.5. This method ensures thesql_query` is executed against the Spark session linked to the batch of data being processed.
Reference: Databricks Documentation
Ultimate access to all questions.
No comments yet.
A data engineer is utilizing a foreachBatch logic to upsert data into a target Delta table. The function to be invoked at each new microbatch processing is shown below with a blank:
def upsert_data(microBatchDF, batch_id):
microBatchDF.createOrReplaceTempView("sales_microbatch")
sql_query = """
MERGE INTO sales_silver a
USING sales_microbatch b
ON a.item_id=b.item_id
AND a.item_timestamp=b.item_timestamp
WHEN NOT MATCHED THEN INSERT *
"""
________________
def upsert_data(microBatchDF, batch_id):
microBatchDF.createOrReplaceTempView("sales_microbatch")
sql_query = """
MERGE INTO sales_silver a
USING sales_microbatch b
ON a.item_id=b.item_id
AND a.item_timestamp=b.item_timestamp
WHEN NOT MATCHED THEN INSERT *
"""
________________
Which option correctly fills in the blank to execute the SQL query in the function on a cluster with Databricks Runtime below 10.5?
A
spark.sql(sql_query)
B
batch_id.sql(sql_query)
C
microBatchDF.sql(sql_query)
D
microBatchDF.sparkSession.sql(sql_query)
E
microBatchDF._jdf.sparkSession().sql(sql_query)