Databricks Certified Data Engineer - Professional

Get started today

Ultimate access to all questions.

A data engineering team is tasked with converting a 1 TB JSON dataset into Parquet format. The goal is to produce part-files that are approximately 512 MB each. Given that built-in Databricks features like Auto-Optimize and Auto-Compaction are not available for this workload, which strategy provides the most efficient performance by ensuring the target file size is met without triggering a data shuffle?

Real Exam

Last updated: January 6, 2026 at 15:42

Ingest the data, perform the necessary narrow transformations, and then use df.repartition(2048) to create 2,048 partitions (calculated as 1 TB / 512 MB) before writing to Parquet.

8.3%

Configure spark.sql.files.maxPartitionBytes to 512 MB, ingest the JSON data, perform narrow transformations, and then write the resulting DataFrame to Parquet.

Comments

Loading comments...

Set spark.sql.shuffle.partitions to 2,048, ingest the data, apply narrow transformations, and perform an orderBy operation to ensure data is sorted and repartitioned before writing to Parquet.

16.7%

Set spark.sql.adaptive.advisoryPartitionSizeInBytes to 512 MB, ingest the data, apply narrow transformations, and then use df.coalesce(2048) to reduce the partition count before writing to Parquet.

16.7%