AWS Certified Solutions Architect - Associate

Get started today

Ultimate access to all questions.

Deep dive into the quiz with AI chat providers.

We prepare a focused prompt with your quiz and certificate details so each AI can offer a more tailored, in-depth explanation.

A marketing company receives a large amount of new clickstream data in Amazon S3 from a marketing campaign. The company needs to analyze the clickstream data in Amazon S3 quickly. Then the company needs to determine whether to process the data further in the data pipeline.

Which solution will meet these requirements with the LEAST operational overhead?

Other

Community

UAnonymous

Last updated: February 23, 2026 at 11:39

Create external tables in a Spark catalog. Configure jobs in AWS Glue to query the data.

Configure an AWS Glue crawler to crawl the data. Configure Amazon Athena to query the data.

Create external tables in a Hive metastore. Configure Spark jobs in Amazon EMR to query the data.

Configure an AWS Glue crawler to crawl the data. Configure Amazon Kinesis Data Analytics to use SQL to query the data.

Explanation:

Explanation

Correct Answer: B

Why Option B is correct:

Least operational overhead: AWS Glue crawler automatically discovers the schema of data in S3 and creates/updates the Data Catalog, eliminating manual schema definition.
Amazon Athena is a serverless interactive query service that allows SQL queries directly on S3 data without infrastructure management.
Quick analysis: Athena provides immediate querying capability on S3 data with pay-per-query pricing.
Integration: AWS Glue Data Catalog integrates seamlessly with Athena, providing a unified metadata repository.

Why other options are incorrect:

Option A: Creating external tables in Spark catalog and configuring AWS Glue jobs requires more operational overhead as it involves job orchestration and Spark cluster management.

Option C: Using Hive metastore and Amazon EMR requires managing EMR clusters (infrastructure), which has significant operational overhead compared to serverless options.

Option D: Kinesis Data Analytics is designed for real-time streaming data processing, not for analyzing static data in S3. It's not the appropriate service for this use case and would require stream setup and management.

Key AWS Services:

AWS Glue Crawler: Automatically discovers data and populates the AWS Glue Data Catalog
Amazon Athena: Serverless interactive query service for analyzing data in S3 using standard SQL
AWS Glue Data Catalog: Central metadata repository for data assets

Use Case Fit: For analyzing clickstream data stored in S3 with minimal operational overhead, the combination of AWS Glue crawler (for schema discovery) and Amazon Athena (for SQL querying) provides the most serverless, low-maintenance solution.

Powered ByGPT-5.2

Comments

Loading comments...