Databricks Certified Data Engineer - Professional

Get started today

Ultimate access to all questions.

A data engineering team needs to ingest a 1 TB JSON dataset and convert it into Parquet format with a target part-file size of approximately 512 MB. Given that Delta Lake features like Auto-Optimize are unavailable, how can they achieve this target size with optimal performance while strictly avoiding any data shuffling?

Real Exam

Last updated: January 6, 2026 at 15:41

Set spark.sql.shuffle.partitions to 2,048 before ingestion, perform narrow transformations, and apply a sort operation to organize the data before writing to Parquet.

0.0%

Configure spark.sql.files.maxPartitionBytes to 512 MB, ingest the data, perform only narrow transformations, and then write the result to Parquet.

Comments

Loading comments...

Set spark.sql.adaptive.advisoryPartitionSizeInBytes to 512 MB, ingest the data, perform narrow transformations, and use coalesce to reach 2,048 partitions before writing.

8.3%

Ingest the data, perform narrow transformations, and use repartition(2048) to set the number of output files based on the target size before writing to Parquet.

8.3%

Configure spark.sql.shuffle.partitions to 512 before ingestion, perform narrow transformations, and write the result directly to the destination.

16.7%