
Answer-first summary for fast verification
Answer: Configure an AWS Glue crawler to crawl the data. Configure Amazon Athena to query the data.
## Explanation **Correct Answer: B** **Why Option B is correct:** 1. **Least operational overhead**: AWS Glue crawler automatically discovers the schema of data in S3 and creates/updates the Data Catalog, eliminating manual schema definition. 2. **Amazon Athena** is a serverless interactive query service that allows SQL queries directly on S3 data without infrastructure management. 3. **Quick analysis**: Athena provides immediate querying capability on S3 data with pay-per-query pricing. 4. **Integration**: AWS Glue Data Catalog integrates seamlessly with Athena, providing a unified metadata repository. **Why other options are incorrect:** **Option A**: Creating external tables in Spark catalog and configuring AWS Glue jobs requires more operational overhead as it involves job orchestration and Spark cluster management. **Option C**: Using Hive metastore and Amazon EMR requires managing EMR clusters (infrastructure), which has significant operational overhead compared to serverless options. **Option D**: Kinesis Data Analytics is designed for real-time streaming data processing, not for analyzing static data in S3. It's not the appropriate service for this use case and would require stream setup and management. **Key AWS Services:** - **AWS Glue Crawler**: Automatically discovers data and populates the AWS Glue Data Catalog - **Amazon Athena**: Serverless interactive query service for analyzing data in S3 using standard SQL - **AWS Glue Data Catalog**: Central metadata repository for data assets **Use Case Fit:** For analyzing clickstream data stored in S3 with minimal operational overhead, the combination of AWS Glue crawler (for schema discovery) and Amazon Athena (for SQL querying) provides the most serverless, low-maintenance solution.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
A marketing company receives a large amount of new clickstream data in Amazon S3 from a marketing campaign. The company needs to analyze the clickstream data in Amazon S3 quickly. Then the company needs to determine whether to process the data further in the data pipeline.
Which solution will meet these requirements with the LEAST operational overhead?
A
Create external tables in a Spark catalog. Configure jobs in AWS Glue to query the data.
B
Configure an AWS Glue crawler to crawl the data. Configure Amazon Athena to query the data.
C
Create external tables in a Hive metastore. Configure Spark jobs in Amazon EMR to query the data.
D
Configure an AWS Glue crawler to crawl the data. Configure Amazon Kinesis Data Analytics to use SQL to query the data.