
Answer-first summary for fast verification
Answer: Move the table definition into a separate function, and make calls to this function using different input parameters inside the for loop.
The issue arises because Python's loop variable `t` is captured late-binding by the nested function, causing all generated tables to reference the last value of `t` ("t3"). To fix this, each table definition must capture the current value of `t` during each iteration. Option C suggests moving the table definition into a separate function, allowing each iteration to pass the current `t` as a parameter, thereby binding the correct value when the function is defined. This ensures each table reads from the intended source (t1, t2, t3). Other options do not address the variable scoping problem directly.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
A data engineer is refactoring DLT code that contains multiple table definitions with similar patterns:
@dlt.table(name="t1_dataset")
def t1_dataset():
return spark.read.table("t1")
@dlt.table(name="t2_dataset")
def t2_dataset():
return spark.read.table("t2")
@dlt.table(name="t3_dataset")
def t3_dataset():
return spark.read.table("t3")
@dlt.table(name="t1_dataset")
def t1_dataset():
return spark.read.table("t1")
@dlt.table(name="t2_dataset")
def t2_dataset():
return spark.read.table("t2")
@dlt.table(name="t3_dataset")
def t3_dataset():
return spark.read.table("t3")
They attempt to parameterize the table creation using this loop:
tables = ["t1", "t2", "t3"]
for t in tables:
@dlt.table(name=f"{t}_dataset")
def new_table():
return spark.read.table(t)
tables = ["t1", "t2", "t3"]
for t in tables:
@dlt.table(name=f"{t}_dataset")
def new_table():
return spark.read.table(t)
After running the pipeline with this refactored code, the DAG displays incorrect configuration values for these tables. What should the data engineer do to correct this?
A
Wrap the for loop inside another table definition, using generalized names and properties to replace with those from the inner table definition.
B
Convert the list of configuration values to a dictionary of table settings, using table names as keys.
C
Move the table definition into a separate function, and make calls to this function using different input parameters inside the for loop.
D
Load the configuration values for these tables from a separate file, located at a path provided by a pipeline parameter.
No comments yet.