
Answer-first summary for fast verification
Answer: from pyspark.sql.functions import year, month, day df = df.withColumn('year', year('date')).withColumn('month', month('date')).withColumn('day', day('date')) print(A)
The correct approach to extract the year, month, and day from a date column is to use the 'year', 'month', and 'day' functions from the 'pyspark.sql.functions' module. Option A does this correctly. Option B is incorrect because it uses the 'substr' method to extract the components, which is not the recommended approach. Option C is incorrect because it uses the column name 'date' instead of the 'date' column itself. Option D is incorrect because it uses the 'selectExpr' method, which is not the standard way to create new columns.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
You are given a Spark DataFrame 'df' with a date column 'date' in the format 'yyyy-MM-dd'. Write a code snippet that extracts the year, month, and day from the 'date' column and creates new columns for each component, and explain the steps involved.
A
from pyspark.sql.functions import year, month, day df = df.withColumn('year', year('date')).withColumn('month', month('date')).withColumn('day', day('date')) print(A)
B
df = df.withColumn('year', df.date.substr(0, 4)).withColumn('month', df.date.substr(5, 2)).withColumn('day', df.date.substr(8, 2)) print(B)
C
df = df.withColumn('year', df.date.year()).withColumn('month', df.date.month()).withColumn('day', df.date.day()) print(C)
D
df = df.selectExpr('date', 'year(date) as year', 'month(date) as month', 'day(date) as day') print(D)
No comments yet.