
Answer-first summary for fast verification
Answer: Perform preprocessing in BigQuery by using SQL. Use the BigQueryClient in TensorFlow to read the data directly from BigQuery.
The correct answer is Option C: Perform preprocessing in BigQuery by using SQL. Use the BigQueryClient in TensorFlow to read the data directly from BigQuery. This option is the most cost-effective and efficient way to preprocess the data as it leverages the power of BigQuery's SQL capabilities, avoids additional data transfer overhead, and utilizes TensorFlow's integration with BigQuery. Other options involve either more complex setups, potential memory issues, or additional cost and complexity without providing significant benefits for this specific use case.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
You are tasked with developing a fraud detection model using Keras and TensorFlow. The records of customer transactions, which serve as your dataset, are stored in a large table in BigQuery. Before training your model, you need to preprocess these records in a way that is both cost-effective and efficient. Additionally, the ultimate goal is to use the trained model for batch inference directly in BigQuery. Considering these requirements, how should you implement the preprocessing workflow?
A
Implement a preprocessing pipeline by using Apache Spark, and run the pipeline on Dataproc. Save the preprocessed data as CSV files in a Cloud Storage bucket.
B
Load the data into a pandas DataFrame. Implement the preprocessing steps using pandas transformations, and train the model directly on the DataFrame.
C
Perform preprocessing in BigQuery by using SQL. Use the BigQueryClient in TensorFlow to read the data directly from BigQuery.
D
Implement a preprocessing pipeline by using Apache Beam, and run the pipeline on Dataflow. Save the preprocessed data as CSV files in a Cloud Storage bucket.
No comments yet.