
Explanation:
The question requires splitting the storeCategory column into two new columns using the underscore _ as the delimiter. The correct approach uses the split function from PySpark's functions module, which returns an array. The first element (index 0) of the array corresponds to the part before the underscore, and the second element (index 1) corresponds to the part after. Option C correctly uses split(col('storeCategory'), '_')[0] and [1] to extract the two parts. The split function is applied to the column, and array indices are used to access the split elements. Option D is also correct because split('storeCategory', '_') is equivalent to split(col('storeCategory'), '_'). In PySpark, passing a string to split treats it as a column name, so the syntax in D is valid and produces the same result as C. Options A and E incorrectly use indices 1 and 2, which would result in out-of-bounds errors or null values. Option B uses col.split(), which is not the correct method for splitting a column in PySpark.
Ultimate access to all questions.
No comments yet.
Which of the following code blocks correctly splits the storeCategory column from DataFrame storesDF at the underscore character, creating two new columns named storeValueCategory and storeSizeCategory?
A sample of DataFrame storesDF is shown below:
storeId open openDate storeCategory
0 true 1100746394 VALUE_MEDIUM
1 true 944572255 MAINSTREAM_SMALL
2 false 925495628 PREMIUM_LARGE
3 true 1397353092 VALUE_MEDIUM
4 true 986505057 VALUE_LARGE
5 true 955988614 PREMIUM_LARGE
storeId open openDate storeCategory
0 true 1100746394 VALUE_MEDIUM
1 true 944572255 MAINSTREAM_SMALL
2 false 925495628 PREMIUM_LARGE
3 true 1397353092 VALUE_MEDIUM
4 true 986505057 VALUE_LARGE
5 true 955988614 PREMIUM_LARGE
A
(storesDF.withColumn("storeValueCategory", split(col("storeCategory"), "")[1]) .withColumn("storeSizeCategory", split(col("storeCategory"), "")[2]))
B
(storesDF.withColumn("storeValueCategory", col("storeCategory").split("")[0]) .withColumn("storeSizeCategory", col("storeCategory").split("")[1]))
C
(storesDF.withColumn("storeValueCategory", split(col("storeCategory"), "")[0]) .withColumn("storeSizeCategory", split(col("storeCategory"), "")[1]))
D
(storesDF.withColumn("storeValueCategory", split("storeCategory", "")[0]) .withColumn("storeSizeCategory", split("storeCategory", "")[1]))
E
(storesDF.withColumn("storeValueCategory", col("storeCategory").split("")[1]) .withColumn("storeSizeCategory", col("storeCategory").split("")[2]))