
Explanation:
The correct method to replace missing values in Spark is DataFrame.na.fill(), which takes the value and a subset of columns. Option A correctly uses na.fill("No Manager", "managerName") where the subset is specified as a string. Other options have issues: B and E use nafill (typo), C and D use col("managerName") which is incorrect for the subset parameter expecting a column name string.
Ultimate access to all questions.
Which of the following code blocks returns a new DataFrame where column managerName from DataFrame storesDF has had its null values replaced with the string "No Manager"?
A sample of DataFrame storesDF is below:
storeId managerName
0 Donec Enim
1 Ultrices Fringilla
2 null
3 Magna Ac
4 null
storeId managerName
0 Donec Enim
1 Ultrices Fringilla
2 null
3 Magna Ac
4 null
A
storesDF.na.fill("No Manager", "managerName")
B
storesDF.nafill("No Manager", col("managerName"))
C
storesDF.na.fill("No Manager", col("managerName"))
D
storesDF.fillna("No Manager", col("managerName"))
E
storesDF.nafill("No Manager", "managerName")
No comments yet.