
Answer-first summary for fast verification
Answer: Create an AWS Glue crawler to crawl the databases. Use the FindMatches transform to find duplicate records in the data. Evaluate and tune the transform by evaluating the performance and results.
Option B is CORRECT because using AWS Glue with the FindMatches transform allows for an easy and managed solution to find duplicate records or match records across databases with inconsistent field names. The AWS Glue crawler can automate the process of crawling the databases to create a data catalog, and the FindMatches transform can handle matching records even when fields do not match exactly. This solution involves minimal operational overhead since AWS Glue is serverless, and the FindMatches transform is specifically designed for this type of record matching without requiring custom model development.
Author: Ritesh Yadav
Ultimate access to all questions.
Question 22/58
A company reads data from customer databases that run on Amazon RDS. The databases contain many inconsistent fields. For example, a customer record field that iNamed place_id in one database is named location_id in another database. The company needs to link customer records across different databases, even when customer record fields do not match.
Which solution will meet these requirements with the LEAST operational overhead?
A
Create a provisioned Amazon EMR cluster to process and analyze data in the databases. Connect to the Apache Zeppelin notebook. Use the FindMatches transform to find duplicate records in the data.
B
Create an AWS Glue crawler to crawl the databases. Use the FindMatches transform to find duplicate records in the data. Evaluate and tune the transform by evaluating the performance and results.
C
Create an AWS Glue crawler to crawl the databases. Use Amazon SageMaker to construct Apache Spark ML pipelines to find duplicate records in the data.
D
Create a provisioned Amazon EMR cluster to process and analyze data in the databases. Connect to the Apache Zeppelin notebook. Use an Apache Spark ML model to find duplicate records in the data. Evaluate and tune the model by evaluating the performance and results.
No comments yet.