
Ultimate access to all questions.
NO.3 Your company is performing data preprocessing for a learning algorithm in Google Cloud Dataflow. Numerous data logs are being generated during this step, and the team wants to analyze them. Due to the dynamic nature of the campaign, the data is growing exponentially every hour. The data scientists have written the following code to read the data for a new key features in the logs. BigQueryIO.Read .named("ReadLogData") .from("clouddataflow-readonly:samples.log_data") You want to improve the performance of this data read. What should you do?_
Explanation:
The correct answer is D because:
TableRow objects directly in the PCollection provides better performance compared to other methodsTableRow object allows for efficient parallel processingBigQueryIO.Read.named("ReadLogData").from("clouddataflow-readonly:samples.log_data") is already reading the entire table, but using TableRow objects optimizes how the data is processed in the pipelineWhy other options are less optimal:
The key insight is that in Dataflow pipelines, representing BigQuery data as TableRow objects in PCollections provides the most efficient processing model for large-scale data operations._