
Answer-first summary for fast verification
Answer: spark_df.filter(col('discount') < 0)
The correct approach in Spark for filtering rows based on a condition is to use the 'filter' function. The code segment 'spark_df.filter(col('discount') < 0)' correctly filters the DataFrame to include only rows where the 'discount' column's value is less than 0. The other options either use incorrect syntax for PySpark or are more appropriate for other data manipulation libraries, such as Pandas.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
A data scientist is working with a Spark DataFrame named 'spark_df'. They want to create a new Spark DataFrame that includes only the rows from 'spark_df' where the 'discount' column's value is less than 0. Which of the following code segments would achieve this goal?
A
spark_df.find(spark_df('discount') < 0)
B
spark_df.loc[spark_df('discount') < 0]
C
spark_df.loc[spark_df('discount') < 0,:]
D
SELECT * FROM spark_df WHERE discount < 0
E
spark_df.filter(col('discount') < 0)
No comments yet.