
Explanation:
When designing an Azure Data Lake Storage Gen2 folder structure for optimal query performance and security, we need to consider two key requirements:
Security by Subject Area: Access control in ADLS Gen2 is applied at the folder level, so placing subject area at the top level simplifies security management.
Query Performance for Date-Based Filtering: Most queries filter by current year or month, so the folder structure should enable efficient partition pruning.
Option A: /{SubjectArea}/{DataSource}/{DD}/{MM}/{YYYY}/{FileData}_{YYYY}_{MM}_{DD}.csv
Option B: /{DD}/{MM}/{YYYY}/{SubjectArea}/{DataSource}/{FileData}_{YYYY}_{MM}_{DD}.csv
Option C: /{YYYY}/{MM}/{DD}/{SubjectArea}/{DataSource}/{FileData}_{YYYY}_{MM}_{DD}.csv
Option D: /{SubjectArea}/{DataSource}/{YYYY}/{MM}/{DD}/{FileData}_{YYYY}_{MM}_{DD}.csv
Security Simplification: By placing {SubjectArea} at the root level, you can apply Azure RBAC or ACLs directly to subject area folders, making security management straightforward and maintainable.
Query Performance: The {YYYY}/{MM}/{DD} structure following the subject area enables:
Best of Both Worlds: This structure balances the competing requirements of security management and query performance, making it the most suitable choice for the given scenario.
Ultimate access to all questions.
You are designing the folder structure for an Azure Data Lake Storage Gen2 container. Data will be queried using services like Azure Databricks and Azure Synapse Analytics serverless SQL pools. The data must be secured by subject area, and most queries will filter for the current year or current month. Which folder structure provides the best query performance and simplifies folder-level security?
A
/{SubjectArea}/{DataSource}/{DD}/{MM}/{YYYY}/{FileData}{YYYY}{MM}_{DD}.csv
B
/{DD}/{MM}/{YYYY}/{SubjectArea}/{DataSource}/{FileData}{YYYY}{MM}_{DD}.csv
C
/{YYYY}/{MM}/{DD}/{SubjectArea}/{DataSource}/{FileData}{YYYY}{MM}_{DD}.csv
D
/{SubjectArea}/{DataSource}/{YYYY}/{MM}/{DD}/{FileData}{YYYY}{MM}_{DD}.csv
No comments yet.