
Explanation:
The correct answer is D. compute.shortcut_limit. This option plays a crucial role in optimizing performance within pandas-on-Spark by setting a threshold for shortcut operations. Here's a breakdown of its functionality:
compute.shortcut_limit to infer the schema.Example usage:
import pyspark.pandas as ps
# Adjust the shortcut limit to 2000 rows
ps.options.compute.shortcut_limit = 2000
import pyspark.pandas as ps
# Adjust the shortcut limit to 2000 rows
ps.options.compute.shortcut_limit = 2000
Understanding and effectively utilizing compute.shortcut_limit can significantly impact the efficiency of data analysis tasks in pandas-on-Spark.
Ultimate access to all questions.
No comments yet.
In the Pandas API on Spark, which option determines the threshold for using shortcuts in operations by computing a specified number of rows with its schema?
A
compute.default_index_type
B
display.max_rows
C
compute.ops_on_diff_frames
D
compute.shortcut_limit