
Explanation:
The correct choice is Detect PII. AWS Glue (via DataBrew) provides a Detect PII transform that scans columns for known sensitive data types and enables actions such as masking, hashing, or redaction so that sensitive values are protected before the data is written to the warehouse. Filter transform is only for row-level filtering based on predicates and cannot locate or protect personally identifiable information. FindMatches transform is designed for entity resolution and deduplication, not for detecting or handling PII. Convert file format to Parquet changes how data is stored and optimized for analytics but offers no capability to classify or obfuscate sensitive fields. When the requirement is to automatically discover and mask sensitive fields in AWS Glue workflows, look for Detect PII. Do not confuse it with transforms for deduplication (FindMatches), row filtering (Filter), or file format conversion.
Ultimate access to all questions.
An insurtech firm processes about 20 million customer records per day that include personally identifiable data such as full names, national ID numbers, and payment card details. They need the ETL pipeline to automatically discover and protect sensitive fields before loading curated data into their analytics warehouse. Which AWS Glue transformation should they use?
A
Filter transform
B
Detect PII
C
FindMatches transform
D
Convert file format to Parquet
No comments yet.