
Answer-first summary for fast verification
Answer: Use Data Fusion to transform the data before loading it into BigQuery.
The correct answer is A. Using Data Fusion to transform the data before loading it into BigQuery leverages a user-friendly interface and built-in transformations for handling data quality issues, such as data type conversions and data cleansing. This approach is advantageous because it efficiently handles large datasets, integrates well with BigQuery, and provides a comprehensive environment for managing data quality. While options B, C, and D have their merits, they do not address data quality issues as comprehensively or efficiently as Data Fusion does.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
As a data engineer, you are tasked with creating a data pipeline to load CSV files from Google Cloud Storage to BigQuery. However, these CSV files are known to have multiple data quality issues. Specifically, the files contain columns with mismatched data types, such as a mixture of STRING and INT64 values, and inconsistencies in the formatting of certain values, including phone numbers and addresses. Your objective is to ensure the data maintains high quality by performing the necessary cleansing and transformations during the loading process. What steps should you take to achieve this?
A
Use Data Fusion to transform the data before loading it into BigQuery.
B
Use Data Fusion to convert the CSV files to a self-describing data format, such as AVRO, before loading the data to BigQuery.
C
Load the CSV files into a staging table with the desired schema, perform the transformations with SQL, and then write the results to the final destination table.
D
Create a table with the desired schema, load the CSV files into the table, and perform the transformations in place using SQL.
No comments yet.