
Databricks Certified Data Engineer - Professional
Get started today
Ultimate access to all questions.
A nightly job ingests data into a Delta Lake table using the following code:
from pyspark.sql.functions import current_timestamp, input_file_name, col
from pyspark.sql.column import Column
def ingest_daily_batch(time_col: Column, year: int, month: int, day: int):
(spark.read
.format("parquet")
.load(f"/mnt/daily_batch/{year}/{month}/{day}")
.select("*",
time_col.alias("ingest_time"),
input_file_name().alias("source_file"))
.write
.mode("append")
.saveAsTable("bronze"))
The next step in the pipeline requires a function that returns an object that can be used to manipulate new records that have not yet been processed to the next table in the pipeline.
Which code snippet completes this function definition?
def new_records():
A nightly job ingests data into a Delta Lake table using the following code:
from pyspark.sql.functions import current_timestamp, input_file_name, col
from pyspark.sql.column import Column
def ingest_daily_batch(time_col: Column, year: int, month: int, day: int):
(spark.read
.format("parquet")
.load(f"/mnt/daily_batch/{year}/{month}/{day}")
.select("*",
time_col.alias("ingest_time"),
input_file_name().alias("source_file"))
.write
.mode("append")
.saveAsTable("bronze"))
The next step in the pipeline requires a function that returns an object that can be used to manipulate new records that have not yet been processed to the next table in the pipeline.
Which code snippet completes this function definition?
def new_records():
Exam-Like