
Answer-first summary for fast verification
Answer: df.withColumn('domain', regexp_extract(col('url'), 'https?://([^/]+)', 1))
The correct answer is A because it correctly uses the `regexp_extract` function to extract the domain part of the URL, using a regular expression that captures the domain after the protocol and before any path segments.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
Suppose you have a DataFrame df with a column url containing URLs. How would you extract the domain name from these URLs and create a new column domain using Spark? Provide the code snippet.
A
df.withColumn('domain', regexp_extract(col('url'), 'https?://([^/]+)', 1))
B
df.select(regexp_extract(col('url'), 'https?://([^/]+)', 1).alias('domain'))
C
df.withColumn('domain', split(col('url'), '/')[2])
D
df.select(split(col('url'), '/')[2].alias('domain'))