
Ultimate access to all questions.
You are tasked with developing a fraud detection model using Keras and TensorFlow. The records of customer transactions, which serve as your dataset, are stored in a large table in BigQuery. Before training your model, you need to preprocess these records in a way that is both cost-effective and efficient. Additionally, the ultimate goal is to use the trained model for batch inference directly in BigQuery. Considering these requirements, how should you implement the preprocessing workflow?
A
Implement a preprocessing pipeline by using Apache Spark, and run the pipeline on Dataproc. Save the preprocessed data as CSV files in a Cloud Storage bucket.
B
Load the data into a pandas DataFrame. Implement the preprocessing steps using pandas transformations, and train the model directly on the DataFrame.
C
Perform preprocessing in BigQuery by using SQL. Use the BigQueryClient in TensorFlow to read the data directly from BigQuery.
D
Implement a preprocessing pipeline by using Apache Beam, and run the pipeline on Dataflow. Save the preprocessed data as CSV files in a Cloud Storage bucket.