Reddit

Given the following query:

.filter(“recent = true“)
.dropDuplicates([“item_id“, “item_timestamp“])
.write
.mode (“overwrite“)
.table(“stream_data_stage“)

.filter(“recent = true“)
.dropDuplicates([“item_id“, “item_timestamp“])
.write
.mode (“overwrite“)
.table(“stream_data_stage“)

Which statement accurately describes the outcome of executing this query?_

Real Exam

An incremental job will overwrite the stream_sink table by those deduplicated records from stream_data_stage that have been added since the last time the job was run._

5.5%

A batch job will overwrite the stream_data_stage table by deduplicated records calculated from all 'recent' items in the stream_sink table._

54.0%

An incremental job will overwrite the stream_data_stage table by those deduplicated records from stream_sink that have been added since the last time the job was run._

15.3%

A batch job will overwrite the stream_sink table by deduplicated records calculated from all 'recent' items in the stream_data_stage table._

8.6%

A batch job will overwrite the stream_data_stage table by those deduplicated records from stream_sink that have been added since the last time the job was run._

16.6%

Databricks Certified Data Engineer - Professional

Get started today

Comments