
Ultimate access to all questions.
Deep dive into the quiz with AI chat providers.
We prepare a focused prompt with your quiz and certificate details so each AI can offer a more tailored, in-depth explanation.
A marketing company receives a large amount of new clickstream data in Amazon S3 from a marketing campaign. The company needs to analyze the clickstream data in Amazon S3 quickly. Then the company needs to determine whether to process the data further in the data pipeline.
Which solution will meet these requirements with the LEAST operational overhead?
A
Create external tables in a Spark catalog. Configure jobs in AWS Glue to query the data.
B
Configure an AWS Glue crawler to crawl the data. Configure Amazon Athena to query the data.
C
Create external tables in a Hive metastore. Configure Spark jobs in Amazon EMR to query the data.
D
Configure an AWS Glue crawler to crawl the data. Configure Amazon Kinesis Data Analytics to use SQL to query the data.
Explanation:
Correct Answer: B
Why Option B is correct:
Why other options are incorrect:
Option A: Creating external tables in Spark catalog and configuring AWS Glue jobs requires more operational overhead as it involves job orchestration and Spark cluster management.
Option C: Using Hive metastore and Amazon EMR requires managing EMR clusters (infrastructure), which has significant operational overhead compared to serverless options.
Option D: Kinesis Data Analytics is designed for real-time streaming data processing, not for analyzing static data in S3. It's not the appropriate service for this use case and would require stream setup and management.
Key AWS Services:
Use Case Fit: For analyzing clickstream data stored in S3 with minimal operational overhead, the combination of AWS Glue crawler (for schema discovery) and Amazon Athena (for SQL querying) provides the most serverless, low-maintenance solution.