
Answer-first summary for fast verification
Answer: \Department\DataSource\YYYY\MM\DataFile_YYYYMMDD.parquet
## Analysis of Folder Structure Requirements Based on the three key requirements for the Azure Data Lake Storage Gen2 folder structure: ### 1. Partition Elimination for Serverless SQL Pools - **Partition elimination** occurs when query predicates can filter data at the folder level, avoiding unnecessary file scans - For time-based queries, placing date components (YYYY/MM/DD) earlier in the path enables better partition pruning - Serverless SQL pools automatically leverage folder structure for partition elimination when querying Parquet/CSV files ### 2. Fast Data Retrieval for Current Month - To optimize current month queries, the folder structure should allow direct navigation to current month data - Having year and month components early in the path enables efficient filtering - Monthly granularity should be easily accessible without traversing unnecessary folders ### 3. Simplified Security Management by Department - Azure Data Lake Storage Gen2 supports hierarchical namespace with inherited permissions - Placing department folders higher in the hierarchy allows setting department-level security policies once - Security inheritance flows downward, so department-level permissions apply to all subfolders and files ## Evaluation of Options **Option A: \\Department\\DataSource\\YYYY\\MM\\DataFile_YYYYMMDD.parquet** - **✓ Excellent for security**: Department at root level enables simple department-wide ACL management - **✓ Good for partition elimination**: Year and month folders support time-based filtering - **✓ Good for current month access**: Direct path to YYYY/MM structure - **Security advantage**: Minimum number of ACLs needed per department **Option B: \\DataSource\\Department\\YYYYMM\\DataFile_YYYYMMDD.parquet** - **✗ Poor for security**: Department is nested under data source, requiring more complex ACL management - **✓ Good for partition elimination**: Combined YYYYMM folder supports monthly filtering - **✓ Good for current month access**: Direct YYYYMM folder access **Option C: \\DD\\MM\\YYYY\\Department\\DataSource\\DataFile_DDMMYY.parquet** - **✗ Poor for partition elimination**: Day-level partitioning first creates excessive small partitions - **✗ Poor for current month access**: Cannot directly access monthly data - **✓ Good for security**: Department is reasonably placed **Option D: \\YYYY\\MM\\DD\\Department\\DataSource\\DataFile_YYYYMMDD.parquet** - **✓ Excellent for partition elimination**: Year/month/day hierarchy is optimal for time-based queries - **✓ Excellent for current month access**: Direct path to monthly data - **✗ Poor for security**: Department is buried deep in hierarchy, complicating security management ## Recommended Solution: Option A **Option A** provides the best balance across all three requirements: - **Security**: Department at root level enables simple, inherited security policies - **Partition Elimination**: Year/month structure supports effective time-based filtering - **Current Month Access**: Direct navigation to YYYY/MM folders While Option D offers superior partition elimination for detailed time queries, it sacrifices the security management requirement. Option A maintains strong performance characteristics while excelling at the security management requirement, making it the optimal choice for this scenario.
Ultimate access to all questions.
Author: LeetQuiz Editorial Team
You are designing a folder structure for files in an Azure Data Lake Storage Gen2 account. The account has a single container holding three years of data.
You need to recommend a folder structure that meets these requirements:
Which folder structure should you recommend?
A
\Department\DataSource\YYYY\MM\DataFile_YYYYMMDD.parquet
B
\DataSource\Department\YYYYMM\DataFile_YYYYMMDD.parquet
C
\DD\MM\YYYY\Department\DataSource\DataFile_DDMMYY.parquet
D
\YYYY\MM\DD\Department\DataSource\DataFile_YYYYMMDD.parquet
No comments yet.