As a data engineer, you are tasked with creating a data pipeline to load CSV files from Google Cloud Storage to BigQuery. However, these CSV files are known to have multiple data quality issues. Specifically, the files contain columns with mismatched data types, such as a mixture of STRING and INT64 values, and inconsistencies in the formatting of certain values, including phone numbers and addresses. Your objective is to ensure the data maintains high quality by performing the necessary cleansing and transformations during the loading process. What steps should you take to achieve this?

Exam-Like

Use Data Fusion to transform the data before loading it into BigQuery.

51.7%

Use Data Fusion to convert the CSV files to a self-describing data format, such as AVRO, before loading the data to BigQuery.

12.1%

Load the CSV files into a staging table with the desired schema, perform the transformations with SQL, and then write the results to the final destination table.

20.7%

Create a table with the desired schema, load the CSV files into the table, and perform the transformations in place using SQL.

15.5%

Google Professional Data Engineer

Comments

Get started today