Reddit

Which of the following code blocks correctly returns a new DataFrame with a modified storeReview column where the suffix "End" has been removed from each string in the storeReview column of DataFrame storesDF?

A sample of DataFrame storesDF is shown below:

storeId  storeReview
0        sem eleifend diam End
1        ...vitae odio egesta End
2        ...amet curabitur en End
3        ...tristique loborti End
4        ..condimentum facil End

storeId  storeReview
0        sem eleifend diam End
1        ...vitae odio egesta End
2        ...amet curabitur en End
3        ...tristique loborti End
4        ..condimentum facil End

Exam-Like

storesDF.withColumn("storeReview", col("storeReview").regexp_replace(" End$", ""))

25.4%

storesDF.withColumn("storeReview", regexp_replace(col("storeReview"), " End$", ""))

41.9%

storesDF.withColumn("storeReview”, regexp_replace(col("storeReview"), " End$"))

8.1%

storesDF.withColumn("storeReview", regexp_replace("storeReview", " End$", ""))

17.6%

storesDF.withColumn("storeReview", regexp_extract(col("storeReview"), " End$", ""))

7.0%

Databricks Certified Associate Developer for Apache Spark

Get started today

Comments