Reddit

A data engineering team is working on a Proof of Concept (POC) that involves populating a downstream table from a source table with Change Data Feed enabled. The source table has 45 columns, and the downstream table requires 47 columns, including two additional columns: typeOfChange to indicate the type of change and version to signify the version number of the change. The target table must be truncated and reloaded with new data upon each query execution. Given the following code snippet with blanks to fill, which option correctly completes the code to achieve the desired outcome?

spark.read.format(‘delta‘) \
  .option(‘readChangeFeed‘, ‘true‘) \
  .option(‘startingVersion‘, 0) \
  .table(‘sourceTable‘) \
  _______(1)________ \
  _______(2)________ \
  .write \
  .format(‘delta‘) \
  _______(3)________ \
  .saveAsTable(‘targetTable‘)
```__

spark.read.format(‘delta‘) \
  .option(‘readChangeFeed‘, ‘true‘) \
  .option(‘startingVersion‘, 0) \
  .table(‘sourceTable‘) \
  _______(1)________ \
  _______(2)________ \
  .write \
  .format(‘delta‘) \
  _______(3)________ \
  .saveAsTable(‘targetTable‘)
```__

Real Exam

.drop(‘_commit_timestamp‘) \
.select(‘*‘, col(‘_change_type‘).alias(‘typeOfChange‘), col(‘_commit_version‘).alias(‘version‘)) \
.mode(‘overwrite‘) \*

14.0%

.drop(‘commit_timestamp’) \
.select(‘*’, col(‘change_type’).alias(‘typeOfChange‘), col(‘commit_version‘).alias(‘version‘)) \
.mode(‘overwrite’) \*_

11.6%

.drop(‘_commit_timestamp’) \
.withColumnRenamed(‘_change_type‘, ‘typeOfChange‘).withColumnRenamed(‘_change_version‘, ‘version‘) \
.mode(‘truncate’) \

7.8%

.drop(‘_change_timestamp’) \
.withColumnRenamed(‘_change_type‘, ‘typeOfChange‘).withColumnRenamed(‘_commit_version‘, ‘version‘) \
.mode(‘overwrite’) \

8.5%

.drop(‘_commit_timestamp’) \
.withColumnRenamed(‘_change_type‘, ‘typeOfChange‘).withColumnRenamed(‘_commit_version‘, ‘version‘) \
.mode(‘overwrite’) \

58.1%

Databricks Certified Data Engineer - Professional

Get started today

Comments