Ultimate access to all questions.
A data scientist is working with a Spark DataFrame named 'spark_df'. They want to create a new Spark DataFrame that includes only the rows from 'spark_df' where the 'discount' column's value is less than 0. Which of the following code segments correctly accomplishes this task?
Explanation:
The correct approach in Spark for filtering rows based on a condition is using the 'filter' method. This method allows you to specify a condition to filter rows in the DataFrame. In this scenario, the condition is that the 'discount' column's value must be less than 0. The other options either use incorrect syntax for PySpark or are more appropriate for other data manipulation libraries, such as Pandas.