
Answer-first summary for fast verification
Answer: 1. `.drop(‘_commit_timestamp’) \` 2. `.withColumnRenamed(‘_change_type‘, ‘typeOfChange‘).withColumnRenamed(‘_commit_version‘, ‘version‘) \` 3. `.mode(‘overwrite’) \`
The correct answer involves dropping the `_commit_timestamp` column, renaming `_change_type` to `typeOfChange` and `_commit_version` to `version`, and using the `overwrite` mode to truncate and reload the target table. This approach meets all specified requirements, including the addition of the two new columns and the truncation of the target table upon each execution. The Change Data Feed adds three columns: `_change_type`, `_commit_version`, and `_commit_timestamp`, which are utilized in this solution.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
A data engineering team is working on a Proof of Concept (POC) that involves populating a downstream table from a source table with Change Data Feed enabled. The source table has 45 columns, and the downstream table requires 47 columns, including two additional columns: typeOfChange to indicate the type of change and version to signify the version number of the change. The target table must be truncated and reloaded with new data upon each query execution. Given the following code snippet with blanks to fill, which option correctly completes the code to achieve the desired outcome?
spark.read.format(‘delta‘) \
.option(‘readChangeFeed‘, ‘true‘) \
.option(‘startingVersion‘, 0) \
.table(‘sourceTable‘) \
_______(1)________ \
_______(2)________ \
.write \
.format(‘delta‘) \
_______(3)________ \
.saveAsTable(‘targetTable‘)
spark.read.format(‘delta‘) \
.option(‘readChangeFeed‘, ‘true‘) \
.option(‘startingVersion‘, 0) \
.table(‘sourceTable‘) \
_______(1)________ \
_______(2)________ \
.write \
.format(‘delta‘) \
_______(3)________ \
.saveAsTable(‘targetTable‘)
A
.drop(‘_commit_timestamp‘) \.select(‘*‘, col(‘_change_type‘).alias(‘typeOfChange‘), col(‘_commit_version‘).alias(‘version‘)) \.mode(‘overwrite‘) \B
.drop(‘commit_timestamp’) \.select(‘*’, col(‘change_type’).alias(‘typeOfChange‘), col(‘commit_version‘).alias(‘version‘)) \.mode(‘overwrite’) \C
.drop(‘_commit_timestamp’) \.withColumnRenamed(‘_change_type‘, ‘typeOfChange‘).withColumnRenamed(‘_change_version‘, ‘version‘) \.mode(‘truncate’) \D
.drop(‘_change_timestamp’) \.withColumnRenamed(‘_change_type‘, ‘typeOfChange‘).withColumnRenamed(‘_commit_version‘, ‘version‘) \.mode(‘overwrite’) \E
.drop(‘_commit_timestamp’) \.withColumnRenamed(‘_change_type‘, ‘typeOfChange‘).withColumnRenamed(‘_commit_version‘, ‘version‘) \.mode(‘overwrite’) \