
Answer-first summary for fast verification
Answer: Train and use the AWS Lake Formation FindMatches transform in the ETL job.
Option D is CORRECT because AWS Lake Formation FindMatches transform is designed specifically for identifying matching records in datasets that do not share a common unique identifier. FindMatches uses machine learning to detect similar records based on patterns and attributes across different datasets. Integrating FindMatches into the AWS Glue ETL job enables the company to deduplicate records or find similarities in the data stored in the data lake, making it ideal for identifying matches even when a unique identifier is not present.
Author: Ritesh Yadav
Ultimate access to all questions.
Question 7/58
A company ingests data from multiple data sources and stores the data in an Amazon S3 bucket. An AWS Glue extract, transform, and load (ETL) job transforms the data and writes the transformed data to an Amazon S3 based data lake. The company uses Amazon Athena to query the data that is in the data lake.
The company needs to identify matching records even when the records do not have a common unique identifier.
Which solution will meet this requirement?
A
Use Amazon Macie pattern matching as part of the ETL job.
B
Train and use the AWS Glue PySpark Filter class in the ETL job.
C
Partition tables and use the ETL job to partition the data on a unique identifier.
D
Train and use the AWS Lake Formation FindMatches transform in the ETL job.
No comments yet.