
Answer-first summary for fast verification
Answer: Use the FindMatches feature of AWS Glue to remove duplicate records.
The FindMatches transform in AWS Glue is a built-in machine learning feature specifically designed to help identify duplicate records even when they do not match exactly (e.g., handling common misspellings, differing casing, or slightly varied formats). This completely fulfills the need to deduplicate data with the LEAST operational overhead.
Author: Ritesh Yadav
Ultimate access to all questions.
Question 15
An investment company needs to manage and extract insights from a volume of semi-structured data that grows continuously. A data engineer needs to deduplicate the semi-structured data, remove records that are duplicates, and remove common misspellings of duplicates. Which solution will meet these requirements with the LEAST operational overhead?
A
Use the FindMatches feature of AWS Glue to remove duplicate records.
B
Use non-Window functions in Amazon Athena to remove duplicate records.
C
Use Amazon Neptune ML and an Apache Gremlin script to remove duplicate records.
D
Use the global tables feature of Amazon DynamoDB to prevent duplicate data.
No comments yet.