
Answer-first summary for fast verification
Answer: /{SubjectArea}/{DataSource}/{YYYY}/{MM}/{DD}/{FileData}_{YYYY}_{MM}_{DD}.csv
## Analysis of Folder Structure Options When designing an Azure Data Lake Storage Gen2 folder structure for optimal query performance and security, we need to consider two key requirements: 1. **Security by Subject Area**: Access control in ADLS Gen2 is applied at the folder level, so placing subject area at the top level simplifies security management. 2. **Query Performance for Date-Based Filtering**: Most queries filter by current year or month, so the folder structure should enable efficient partition pruning. ### Evaluation of Each Option: **Option A**: `/{SubjectArea}/{DataSource}/{DD}/{MM}/{YYYY}/{FileData}_{YYYY}_{MM}_{DD}.csv` - ❌ **Poor for date filtering**: Date components are too deep in the hierarchy, requiring scanning through multiple levels - ❌ **Inefficient for common queries**: Queries filtering by year/month must traverse through day-level folders first **Option B**: `/{DD}/{MM}/{YYYY}/{SubjectArea}/{DataSource}/{FileData}_{YYYY}_{MM}_{DD}.csv` - ❌ **Poor security management**: Subject area is buried deep, making folder-level security complex to implement - ❌ **Inefficient organization**: Data is scattered across date folders first, making subject-based management difficult **Option C**: `/{YYYY}/{MM}/{DD}/{SubjectArea}/{DataSource}/{FileData}_{YYYY}_{MM}_{DD}.csv` - ✅ **Good for date filtering**: Year and month are at the top, enabling efficient partition pruning - ❌ **Poor security implementation**: Subject area is not at the top level, complicating access control management **Option D**: `/{SubjectArea}/{DataSource}/{YYYY}/{MM}/{DD}/{FileData}_{YYYY}_{MM}_{DD}.csv` - ✅ **Optimal security**: Subject area at the top level allows straightforward folder-level security implementation - ✅ **Excellent query performance**: Year and month are positioned appropriately for efficient partition pruning - ✅ **Balanced approach**: Supports both security requirements and performance optimization ### Why Option D is Optimal: 1. **Security Simplification**: By placing `{SubjectArea}` at the root level, you can apply Azure RBAC or ACLs directly to subject area folders, making security management straightforward and maintainable. 2. **Query Performance**: The `{YYYY}/{MM}/{DD}` structure following the subject area enables: - **Partition Pruning**: Services like Azure Databricks and Synapse Analytics can eliminate unnecessary folder scans when filtering by year/month - **Hive-Style Partitioning**: This follows the standard partitioning pattern that most big data engines optimize for - **Efficient Date Filtering**: Queries for current year/month can quickly navigate to the relevant partitions 3. **Best of Both Worlds**: This structure balances the competing requirements of security management and query performance, making it the most suitable choice for the given scenario.
Ultimate access to all questions.
Author: LeetQuiz Editorial Team
You are designing the folder structure for an Azure Data Lake Storage Gen2 container. Data will be queried using services like Azure Databricks and Azure Synapse Analytics serverless SQL pools. The data must be secured by subject area, and most queries will filter for the current year or current month. Which folder structure provides the best query performance and simplifies folder-level security?
A
/{SubjectArea}/{DataSource}/{DD}/{MM}/{YYYY}/{FileData}{YYYY}{MM}_{DD}.csv
B
/{DD}/{MM}/{YYYY}/{SubjectArea}/{DataSource}/{FileData}{YYYY}{MM}_{DD}.csv
C
/{YYYY}/{MM}/{DD}/{SubjectArea}/{DataSource}/{FileData}{YYYY}{MM}_{DD}.csv
D
/{SubjectArea}/{DataSource}/{YYYY}/{MM}/{DD}/{FileData}{YYYY}{MM}_{DD}.csv
No comments yet.