
Answer-first summary for fast verification
Answer: storesDF.withColumn("storeReview", regexp_replace(col("storeReview"), " End$", "")), storesDF.withColumn("storeReview", regexp_replace("storeReview", " End$", ""))
The goal is to remove the substring " End" only when it occurs at the end of the storeReview column in a PySpark DataFrame. The regex pattern " End$" does exactly that: " End" → literal substring $ → end of string anchor We use the regexp_replace function from pyspark.sql.functions: regexp_replace(str: ColumnOrName, pattern: str, replacement: str) -> Column First parameter can be either: a Column object (e.g., col("storeReview")) OR a string column name (e.g., "storeReview") Correct Options ✅ Option B storesDF.withColumn( "storeReview", regexp_replace(col("storeReview"), " End$", "") ) Works: passes a Column object using col(). Correct pattern and replacement. ✅ Option D storesDF.withColumn( "storeReview", regexp_replace("storeReview", " End$", "") ) Works: passes column name as a string, which is also valid. Why Others Are Wrong A ❌ col("storeReview").regexp_replace(" End$", "") Fails in PySpark: Column objects do not have a .regexp_replace() method → raises AttributeError. C ❌ Missing the replacement argument — regexp_replace needs 3 arguments. E ❌ Uses regexp_extract which extracts instead of replacing. Final Answer for PySpark: B and D ✅ 💡 Real-World Tip: When writing PySpark transformations, remember: Functions like regexp_replace, concat, lower, etc., live in pyspark.sql.functions. Column expressions do not expose them as methods like Pandas does — so Option A works in Scala Spark (where Column has .regexp_replace()), but not in PySpark.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
Which of the following code blocks correctly returns a new DataFrame with a modified storeReview column where the suffix "End" has been removed from each string in the storeReview column of DataFrame storesDF?
A sample of DataFrame storesDF is shown below:
storeId storeReview
0 sem eleifend diam End
1 ...vitae odio egesta End
2 ...amet curabitur en End
3 ...tristique loborti End
4 ..condimentum facil End
storeId storeReview
0 sem eleifend diam End
1 ...vitae odio egesta End
2 ...amet curabitur en End
3 ...tristique loborti End
4 ..condimentum facil End
A
storesDF.withColumn("storeReview", col("storeReview").regexp_replace(" End$", ""))
B
storesDF.withColumn("storeReview", regexp_replace(col("storeReview"), " End$", ""))
C
storesDF.withColumn("storeReview”, regexp_replace(col("storeReview"), " End$"))
D
storesDF.withColumn("storeReview", regexp_replace("storeReview", " End$", ""))
E
storesDF.withColumn("storeReview", regexp_extract(col("storeReview"), " End$", ""))
No comments yet.