
Explanation:
The correct answer is C. Here's why:
transform(): This method applies a function element-wise to a pandas-on-Spark DataFrame. It requires the function to return a Series or DataFrame with the same length as the input, preserving the index and column labels of the original DataFrame.apply(): This method applies a function along a specified axis (rows or columns) of a pandas-on-Spark DataFrame. It allows the function to return a value of arbitrary length, including a scalar, Series, or DataFrame, which might not preserve the index or column labels.When to Use Each:
transform() when you want to apply a function that directly transforms each element or row into a corresponding output of the same length, maintaining the DataFrame's structure.apply() when you need more flexibility in the output length or want to perform operations that might involve aggregations, reductions, or manipulations that change the DataFrame's shape or structure.Example:
import pandas as pd
# Sample DataFrame
df = pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]})
# transform() example (same length output)
def double_values(x):
return x * 2
df_transformed = df.transform(double_values) # Output: DataFrame with columns A and B doubled
# apply() example (arbitrary length output)
def sum_row(row):
return row.sum()
df_applied = df.apply(sum_row, axis=1) # Output: Series with the sum of each row
import pandas as pd
# Sample DataFrame
df = pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]})
# transform() example (same length output)
def double_values(x):
return x * 2
df_transformed = df.transform(double_values) # Output: DataFrame with columns A and B doubled
# apply() example (arbitrary length output)
def sum_row(row):
return row.sum()
df_applied = df.apply(sum_row, axis=1) # Output: Series with the sum of each row
Key Points:
transform() for element-wise operations that maintain DataFrame structure.apply() for more flexible operations that might change DataFrame shape.Ultimate access to all questions.
What distinguishes DataFrame.transform() from DataFrame.apply() in pandas-on-Spark?
A
transform allows the function to return an output of arbitrary length, while apply requires the same length as the input.
B
Both transform and apply require the function to return the same length as the input.
C
transform requires the function to return the same length as the input, while apply allows an arbitrary length.
D
Both transform and apply allow an arbitrary length for the output.
No comments yet.