
Answer-first summary for fast verification
Answer: Create a data lake by using AWS Lake Formation. Create an AWS Glue JDBC connection to Amazon RDS. Register the S3 bucket in Lake Formation. Use Lake Formation access controls to limit access.
## Explanation **Correct Answer: C** AWS Lake Formation is specifically designed to address the requirements in this scenario: ### Why Option C is Correct: 1. **Data Lake Creation**: Lake Formation helps create a secure data lake that can integrate data from multiple sources (S3 and RDS in this case). 2. **Fine-Grained Permissions**: Lake Formation provides fine-grained access controls at the table, column, and row levels, which is exactly what's needed for various teams to access different parts of the data. 3. **Minimizes Operational Overhead**: Lake Formation automates many data lake management tasks, including data ingestion, cataloging, and security management. 4. **Integration**: The AWS Glue JDBC connection to Amazon RDS allows bringing relational data into the data lake, while registering the S3 bucket makes the purchase data available. 5. **Unified Access Control**: Lake Formation provides a centralized way to manage permissions across both S3 and RDS data sources. ### Why Other Options Are Incorrect: **Option A**: Migrating purchase data to RDS would not be scalable for the volume described (50M customers, 25K orders/day). RDS is not designed for analytics at this scale, and RDS access controls are not as fine-grained as needed for analytics teams. **Option B**: While this solution uses Athena for querying, S3 policies alone don't provide fine-grained, table-level permissions. S3 policies work at the bucket/object level, not at the column/row level needed for analytics teams. **Option D**: Amazon Redshift is a data warehouse solution, but it requires significant operational overhead for data loading and management. Redshift access controls are also not as fine-grained as Lake Formation's capabilities, and maintaining a Redshift cluster adds operational complexity. ### Key AWS Services Mentioned: - **AWS Lake Formation**: Centralized data lake management with fine-grained access controls - **AWS Glue**: ETL service with crawlers for data discovery and JDBC connections - **Amazon S3**: Storage for purchase data - **Amazon RDS**: Relational database for customer data - **Amazon Athena**: Serverless query service (mentioned in Option B) - **Amazon Redshift**: Data warehouse service (mentioned in Option D) This solution provides the scalability needed for large datasets while offering the fine-grained permissions required for multiple analytics teams, all with minimal operational overhead.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
An online retail company has more than 50 million active customers and receives more than 25,000 orders each day. The company collects purchase data for customers and stores this data in Amazon S3. Additional customer data is stored in Amazon RDS.
The company wants to make all the data available to various teams so that the teams can perform analytics. The solution must provide the ability to manage fine-grained permissions for the data and must minimize operational overhead.
Which solution will meet these requirements?
A
Migrate the purchase data to write directly to Amazon RDS. Use RDS access controls to limit access.
B
Schedule an AWS Lambda function to periodically copy data from Amazon RDS to Amazon S3. Create an AWS Glue crawler. Use Amazon Athena to query the data. Use S3 policies to limit access.
C
Create a data lake by using AWS Lake Formation. Create an AWS Glue JDBC connection to Amazon RDS. Register the S3 bucket in Lake Formation. Use Lake Formation access controls to limit access.
D
Create an Amazon Redshift cluster. Schedule an AWS Lambda function to periodically copy data from Amazon S3 and Amazon RDS to Amazon Redshift. Use Amazon Redshift access controls to limit access.