
Answer-first summary for fast verification
Answer: Use Azure Data Factory with a Copy Data activity to ingest the CSV file, and then use a Data Flow activity to remove duplicates and standardize email addresses.
Option A is the most suitable approach for this scenario. Azure Data Factory provides a comprehensive solution for ingesting, cleansing, and transforming data from various sources, including CSV files in Azure Blob Storage. By using a Copy Data activity, you can ingest the CSV file into Azure Data Factory. Then, by using a Data Flow activity, you can remove duplicate records based on the customer_id column and standardize the email addresses by applying a 'Trim' transformation to remove leading or trailing spaces. This approach allows for efficient and flexible data cleansing and transformation without the need for custom scripts or external services.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
You are tasked with designing a solution to cleanse and transform data from a CSV file stored in Azure Blob Storage. The CSV file contains customer information, including customer_id, first_name, last_name, and email_address. You need to remove any duplicate records and standardize the email addresses by removing any leading or trailing spaces. Which of the following Azure services and techniques would you use to achieve this?
A
Use Azure Data Factory with a Copy Data activity to ingest the CSV file, and then use a Data Flow activity to remove duplicates and standardize email addresses.
B
Use Azure Stream Analytics to ingest the CSV file as a stream, and then apply a deduplication and email standardization logic.
C
Use Azure Data Lake Storage Gen2 to store the CSV file, and then use Azure Databricks to run a custom script for deduplication and email standardization.
D
Use Azure Logic Apps to trigger a workflow when the CSV file is uploaded to Azure Blob Storage, and then use a custom connector to perform deduplication and email standardization.