
Answer-first summary for fast verification
Answer: For daily incoming data, use AWS Glue crawlers to scan and identify the schema., For daily and archived data, use Amazon EMR to perform data transformations.
Option A is CORRECT because using AWS Glue crawlers to scan and identify the schema for daily incoming data is a cost-effective way to handle schema discovery. AWS Glue crawlers automatically crawl data sources to infer schemas and create metadata in the AWS Glue Data Catalog, which can then be used in subsequent ETL processes. Option D is CORRECT because using Amazon EMR to perform data transformations for both daily and archived data is cost-effective for large-scale data processing. Amazon EMR provides a scalable and flexible way to process large volumes of data using open-source tools like Apache Spark and Apache Hadoop, which are well-suited for both the daily transformations and the one-time transformations of terabytes of archived data.
Author: Ritesh Yadav
Ultimate access to all questions.
Question 29/60
A company wants to use machine learning (ML) to perform analytics on data that is in an Amazon S3 data lake. The company has two data transformation requirements that will give consumers within the company the ability to create reports.
The company must perform daily transformations on 300 GB of data that is in a variety format that must arrive in Amazon S3 at a scheduled time. The company must perform one-time transformations of terabytes of archived data that is in the S3 data lake. The company uses Amazon Managed Workflows for Apache Airflow (Amazon MWAA) Directed Acyclic Graphs (DAGs) to orchestrate processing.
Which combination of tasks should the company schedule in the Amazon MWAA DAGs to meet these requirements MOST cost-effectively? (Choose two.)
A
For daily incoming data, use AWS Glue crawlers to scan and identify the schema.
B
For daily incoming data, use Amazon Athena to scan and identify the schema.
C
For daily incoming data, use Amazon Redshift to perform transformations.
D
For daily and archived data, use Amazon EMR to perform data transformations.
E
For archived data, use Amazon SageMaker to perform data transformations.
No comments yet.