
Answer-first summary for fast verification
Answer: Use Data Fusion to transform the data before loading it into BigQuery.
Using Data Fusion to transform the data before loading it into BigQuery is the correct approach because: 1. Data Fusion is a fully managed, cloud-native data integration service designed for ETL tasks. 2. It provides a visual interface for designing ETL pipelines, simplifying the handling of data quality issues and transformations. 3. It allows for easy definition of transformation logic to address issues like mismatched data types and inconsistent formatting. 4. Preprocessing and cleaning data with Data Fusion before loading ensures data quality and proper format for analysis. Other options are less optimal because: - Converting to AVRO (Option A) focuses on file format rather than directly addressing data quality. - Using staging tables and SQL (Options B and C) may not be as efficient or scalable for complex data quality issues as Data Fusion.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
When loading CSV files from Cloud Storage to BigQuery, you encounter data quality issues such as mixed data types in the same column and inconsistent formatting of values like phone numbers or addresses. What is the best approach to ensure data quality, perform necessary cleansing, and transformation in your data pipeline?
A
Convert the CSV files to a self-describing data format, such as AVRO, before loading the data to BigQuery using Data Fusion.
B
Load the CSV files into a staging table with the desired schema, perform the transformations with SQL, and then write the results to the final destination table.
C
Create a table with the desired schema, load the CSV files into the table, and perform the transformations in place using SQL.
D
Use Data Fusion to transform the data before loading it into BigQuery.