
Explanation:
AWS Glue Studio has a native Detect PII transform built directly into the visual interface. It leverages machine learning to automatically identify personally identifiable information (PII) at the column or cell level, and provides built-in options to obfuscate (such as redacting or hashing) the data. This avoids having to write custom AWS Lambda code with SDKs (Option A) or sending data redundantly to DynamoDB (Option D). AWS Glue Data Quality (Option C) is designed for data validation rules, not data obfuscation. Hence, Option B meets the requirement with the least operational effort.
Ultimate access to all questions.
Question 38
A data engineer must use AWS services to ingest a dataset into an Amazon S3 data lake. The data engineer profiles the dataset and discovers that the dataset contains personally identifiable information (PII). The data engineer must implement a solution to profile the dataset and obfuscate the PII. Which solution will meet this requirement with the LEAST operational effort?
A
Use an Amazon Kinesis Data Firehose delivery stream to process the dataset. Create an AWS Lambda transform function to identify the PII. Use an AWS SDK to obfuscate the PII. Set the S3 data lake as the target for the delivery stream.
B
Use the Detect PII transform in AWS Glue Studio to identify the PII. Obfuscate the PII. Use an AWS Step Functions state machine to orchestrate a data pipeline to ingest the data into the S3 data lake.
C
Use the Detect PII transform in AWS Glue Studio to identify the PII. Create a rule in AWS Glue Data Quality to obfuscate the PII. Use an AWS Step Functions state machine to orchestrate a data pipeline to ingest the data into the S3 data lake.
D
Ingest the dataset into Amazon DynamoDB. Create an AWS Lambda function to identify and obfuscate the PII in the DynamoDB table and to transform the data. Use the same Lambda function to ingest the data into the S3 data lake.
No comments yet.