
Answer-first summary for fast verification
Answer: VACUUM to remove old files no longer referenced., OPTIMIZE to create fewer files with a larger size.
To address the two main issues: removing files that are no longer used and combining small files into larger ones, we use the following methods: 1. VACUUM: This command removes old files that are no longer referenced by the Delta table to reduce storage costs. This helps in cleaning up unnecessary data and log files. 2. OPTIMIZE: This improves query performance by merging smaller files into larger ones, thereby creating fewer files with a target size of 1 GB per file. The OPTIMIZE command is specifically designed for this purpose and is more appropriate than using optimizeWrite, which is typically set by default and would only optimize data as it is written, not over time.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
You are managing a Fabric tenant with a lakehouse named Lakehouse1. This lakehouse receives readings from 100 IoT devices, and these readings are appended to a Delta table within Lakehouse1. Each reading set is roughly 25 KB, amounting to approximately 10 GB of data per day. All the table and SparkSession settings are currently at their defaults. Recently, you have noticed that query execution is slow. Additionally, the lakehouse storage includes obsolete data and log files.
Your task is to enhance performance by removing these unused files and consolidating smaller files into larger files, aiming for a target file size of 1 GB each. To accomplish this, identify the appropriate actions for each requirement. Each action can be used once, multiple times, or not at all. You may need to adjust the split bar between panes or scroll to see the full content. NOTE: Each correct selection is worth one point.
A
Set the optimizeWrite table setting.
B
Run the OPTIMIZE command on a schedule.
C
VACUUM to remove old files no longer referenced.
D
OPTIMIZE to create fewer files with a larger size.