
Ultimate access to all questions.
As a data scientist at a leading bank, you are tasked with developing a machine learning model to predict loan default risk. The dataset, stored in BigQuery, consists of hundreds of millions of records, meticulously cleaned and prepared for analysis. Your objective is to leverage TensorFlow and Vertex AI for model development and comparison, ensuring the solution is scalable and minimizes data ingestion bottlenecks. Given the constraints of handling such a massive dataset efficiently, which approach should you adopt? Please choose the best option.
A
Export the dataset to CSV files stored in Cloud Storage, then utilize tf.data.TextLineDataset() for data reading. This method is straightforward but may introduce scalability challenges.
B
Implement TensorFlow I/O’s BigQuery Reader to directly access and read the data from BigQuery. This approach is designed for scalability and efficiency with large datasets.
C
Convert the dataset into TFRecords format, stored in Cloud Storage, and use tf.data.TFRecordDataset() for reading. While TFRecords are efficient for TensorFlow, the conversion process may add overhead.
D
Use the BigQuery client library to load the dataset into a pandas dataframe, then convert it into a TensorFlow dataset using tf.data.Dataset.from_tensor_slices(). This method is simple but may not be feasible for the dataset size due to memory constraints.