Analysis of Folder Structure Options
Key Requirements:
- Supports usage patterns: Most queries filter by current year OR week
- Simplifies folder security: Data secured by data source
- Minimizes query times: Optimized for year/week filtering
Evaluation of Options:
Option A (\DataSource\SubjectArea\YYYY\WW\FileData_YYYY_MM_DD.parquet) - RECOMMENDED
- Security: Places data source at the top level, enabling straightforward security policies at the data source level
- Query Performance: Hierarchical structure (YYYY→WW) allows efficient partition pruning when filtering by year, and still supports week filtering through folder traversal
- Usage Pattern Support: Organizes data logically by time granularity, supporting both year-based and week-based queries
- Best Practice: Follows Azure's recommended hierarchical partitioning approach
Option B (\DataSource\SubjectArea\YYYY-WW\FileData_YYYY_MM_DD.parquet) - NOT OPTIMAL
- Query Performance: Combines year and week in single folder name, requiring full folder scan when filtering by week alone
- Security: Similar to A, but query optimization is compromised
- Usage Pattern: Less efficient for "year OR week" filtering requirement
Option C (DataSource\SubjectArea\WW\YYYY\FileData_YYYY_MM_DD.parquet) - NOT OPTIMAL
- Security: Data source not at top level, complicating security implementation
- Query Performance: Week-first hierarchy is inefficient for year-based queries
Option D (\YYYY\WW\DataSource\SubjectArea\FileData_YYYY_MM_DD.parquet) - NOT OPTIMAL
- Security: Data source buried deep in hierarchy, making security management complex
- Query Performance: Time-based partitioning is good, but security requirement is compromised
Option E (WW\YYYY\SubjectArea\DataSource\FileData_YYYY_MM_DD.parquet) - NOT OPTIMAL
- Security: Worst option for security implementation with data source at bottom
- Query Performance: Week-first hierarchy is inefficient
Why Option A is Optimal:
- Security Simplification: Data source at root level enables straightforward access control policies
- Query Optimization: Hierarchical time partitioning (Year→Week) enables efficient partition elimination
- Usage Pattern Alignment: Supports both year-based and week-based filtering scenarios
- Azure Best Practices: Aligns with Microsoft's recommended data lake organization patterns for time-series data
Option A provides the best balance of security management simplicity and query performance optimization while fully supporting the specified usage patterns.