
Ultimate access to all questions.
You are working on a machine learning project where you need to train a TensorFlow model on a structured dataset containing 100 billion records, stored across multiple CSV files. The project is constrained by tight deadlines and a limited budget, requiring an efficient solution that minimizes costs while maximizing performance. Additionally, the solution must be scalable to accommodate future data growth. Given these constraints, which of the following approaches would BEST improve the input/output execution performance for training your TensorFlow model? Choose the two most effective options.
A
Load the data into Cloud Bigtable, and read the data from Bigtable, leveraging its high throughput and low latency for NoSQL operations.
B
Convert the CSV files into shards of TFRecords, and store the data in the Hadoop Distributed File System (HDFS), utilizing its distributed nature for large datasets.
C
Load the data into BigQuery, and read the data from BigQuery, taking advantage of its serverless architecture and SQL capabilities for structured data.
D
Convert the CSV files into shards of TFRecords, and store the data in Cloud Storage, benefiting from its high durability and integration with TensorFlow.
E
Implement a combination of both converting the CSV files into TFRecords stored in Cloud Storage and utilizing BigQuery for complex queries on the dataset.