Databricks Certified Machine Learning - Associate

Get started today

Ultimate access to all questions.

Explanation:

Correct Answer: B. Apache Arrow
Apache Arrow is the in-memory columnar data format used by the Pandas API on Spark (formerly known as Koalas) for efficient data transfer between JVM (Java Virtual Machine) and Python processes. It is designed for high-speed, low-latency data exchange and conversion, making it particularly suitable for handling large datasets in distributed environments like Spark.

Other Options:

A (ORC), C (Avro), and D (Parquet) are widely recognized storage formats for data persistence in big data ecosystems. However, they are not primarily intended for in-memory data transfer between JVM and Python processes within the context of the Pandas API on Spark.

Apache Arrow's columnar memory format enables the Pandas API on Spark to harness the speed and efficiency of Spark's distributed computing capabilities while offering the user-friendly interface of pandas. This integration is crucial for the seamless operation of these technologies together.

Explanation:

Other Options:

A (ORC), C (Avro), and D (Parquet) are widely recognized storage formats for data persistence in big data ecosystems. However, they are not primarily intended for in-memory data transfer between JVM and Python processes within the context of the Pandas API on Spark.

Comments (0)

No comments yet.

Which in-memory columnar data format is utilized by the Pandas API on Spark to efficiently transfer data between JVM and Python processes?

Real Exam

Last updated: January 15, 2026 at 14:03

ORC

4.5%

Apache Arrow

70.9%

Avro

6.4%

Parquet

18.2%