
Explanation:
The correct answer is C. import pyspark.pandas as ps.
Explanation:
The pandas API on Spark, previously known as Koalas, enables users to apply pandas-like syntax while leveraging the distributed computing capabilities of Apache Spark. This API is integrated into PySpark under the pyspark.pandas namespace.
import pandas as ps imports the standard pandas library, which lacks Spark's distributed computing features.databricks.pandas, pandas.spark, or databricks.pyspark do not exist in standard distributions.By using import pyspark.pandas as ps, the data scientist can efficiently refactor their pandas DataFrame code for large-scale data processing, combining the ease of pandas syntax with Spark's scalability.
Ultimate access to all questions.
A data scientist is transitioning their pandas DataFrame code to utilize the pandas API on Spark. They are working with the following incomplete code snippet:
________BLANK_________
df = ps.read_parquet(path)
df["category"].value_counts()
________BLANK_________
df = ps.read_parquet(path)
df["category"].value_counts()
Which line of code should they use to successfully complete the refactoring with the pandas API on Spark?
A
import pandas as ps
B
import databricks.pandas as ps
C
import pyspark.pandas as ps
D
import pandas.spark as ps
E
import databricks.pyspark as ps
No comments yet.