
Answer-first summary for fast verification
Answer: df.withColumn('spark_occurrences', size(split(col('text'), 'Spark'))) - 1
The correct answer is A because it correctly uses the `split` function to divide the `text` column by the word 'Spark' and counts the occurrences, subtracting 1 to adjust for the split array size.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
Given a DataFrame df with a column text containing sentences, how would you extract all occurrences of the word 'Spark' into a new column spark_occurrences using Spark? Provide the code snippet.
A
df.withColumn('spark_occurrences', size(split(col('text'), 'Spark'))) - 1
B
df.select(size(split(col('text'), 'Spark')) - 1)
C
df.withColumn('spark_occurrences', regexp_extract(col('text'), 'Spark', 0))
D
df.select(regexp_extract(col('text'), 'Spark', 0))