
Explanation:
The best fit is Use Amazon Athena to query the S3 raw zone with SQL. Athena lets you run standard SQL directly on data in Amazon S3 without standing up infrastructure, and you pay only for the data scanned, which aligns with low-ops and cost-effective requirements. Run an EMR Spark cluster on a schedule and execute SparkSQL against the new raw files every hour is overkill for quick sanity checks because it involves provisioning and managing clusters and jobs, increasing operational effort and cost. Load each 15-minute increment into Amazon Redshift Serverless and validate with SQL still requires building and maintaining ingestion pipelines and can be more expensive than querying S3 directly for ad hoc checks. Use AWS Glue DataBrew to profile and validate data in S3 is not appropriate because DataBrew focuses on visual transformations and profiling rather than running arbitrary SQL queries. When you see ad hoc SQL on S3 with minimal ops and cost-effective, think Amazon Athena. Reduce cost further with partitioning, compression, and columnar formats to minimize scanned data.
Ultimate access to all questions.
A digital advertising analytics startup ingests web and app clickstream events into the raw area of an Amazon S3 data lake every 15 minutes. The team wants to run ad hoc SQL sanity checks directly on these raw files with minimal administration and low cost. Which AWS service should they choose?
A
Run an EMR Spark cluster on a schedule and execute SparkSQL against the new raw files every hour
B
Use Amazon Athena to query the S3 raw zone with SQL
C
Load each 15-minute increment into Amazon Redshift Serverless and validate with SQL
D
Use AWS Glue DataBrew to profile and validate data in S3
No comments yet.