
Explanation:
The correct method to replace missing values in a DataFrame in Apache Spark is using na.fill(). This method takes the value to replace missing values with and a subset of columns to apply the replacement to. Option A (storesDF.na.fill(30000, Seq("sqft"))) is correct for Scala as it specifies the column in a sequence. Option E (storesDF.na.fill(30000, "sqft")) is correct for Python as it directly specifies the column name as a string. Options B, C, and D are incorrect due to syntax errors or the use of non-existent methods (nafill, col("sqft"), fillna).
Ultimate access to all questions.
No comments yet.
Which of the following code blocks returns a new DataFrame where column sqft from DataFrame storesDF has its null values replaced with the value 30000?
A sample of DataFrame storesDF is below:
storeId sqft
0 43161 51200
1 51200 null
2 null 78367
3 78367 null
storeId sqft
0 43161 51200
1 51200 null
2 null 78367
3 78367 null
A
storesDF.na.fill(30000, Seq("sqft"))
B
storesDF.nafill(30000, col("sqft"))
C
storesDF.na.fill(30000, col("sqft"))
D
storesDF.fillna(30000, col("sqft"))
E
storesDF.na.fill(30000, "sqft")