AWS Certified Cloud Practitioner

Get started today

Ultimate access to all questions.

Quick Answer

Answer-first summary for fast verification

Answer: Use the FindMatches feature of AWS Glue to remove duplicate records.

The FindMatches transform in AWS Glue is a built-in machine learning feature specifically designed to help identify duplicate records even when they do not match exactly (e.g., handling common misspellings, differing casing, or slightly varied formats). This completely fulfills the need to deduplicate data with the LEAST operational overhead.

Author: Ritesh Yadav

Quick Answer

Answer-first summary for fast verification

Answer: Use the FindMatches feature of AWS Glue to remove duplicate records.

Author: Ritesh Yadav

Comments (0)

No comments yet.

Question 15
An investment company needs to manage and extract insights from a volume of semi-structured data that grows continuously. A data engineer needs to deduplicate the semi-structured data, remove records that are duplicates, and remove common misspellings of duplicates. Which solution will meet these requirements with the LEAST operational overhead?

Real Exam

Community

RRitesh

Last updated: April 29, 2026 at 05:44

Use the FindMatches feature of AWS Glue to remove duplicate records.

Use non-Window functions in Amazon Athena to remove duplicate records.

Use Amazon Neptune ML and an Apache Gremlin script to remove duplicate records.

Use the global tables feature of Amazon DynamoDB to prevent duplicate data.