
Explanation:
Amazon Athena allows running standard SQL directly on S3 data without infrastructure, paying only for data scanned, which is low-ops and cost-effective. EMR requires cluster management. Redshift needs ingestion pipelines. DataBrew is for visual transformations, not arbitrary SQL.
Ultimate access to all questions.
A digital advertising analytics startup ingests web and app clickstream events into the raw area of an Amazon S3 data lake every 15 minutes. The team wants to run ad hoc SQL sanity checks directly on these raw files with minimal administration and low cost. Which AWS service should they choose?
A
Run an EMR Spark cluster on a schedule and execute SparkSQL against the new raw files every hour
B
Use Amazon Athena to query the S3 raw zone with SQL
C
Load each 15-minute increment into Amazon Redshift Serverless and validate with SQL
D
Use AWS Glue DataBrew to profile and validate data in S3
No comments yet.