
Answer-first summary for fast verification
Answer: Convert the CSV files into shards of TFRecords, and store the data in Cloud Storage, benefiting from its high durability and integration with TensorFlow., Implement a combination of both converting the CSV files into TFRecords stored in Cloud Storage and utilizing BigQuery for complex queries on the dataset.
The most effective method to enhance I/O performance during training is to convert the CSV files into shards of TFRecords and store them in Cloud Storage. TFRecord, a binary format, is optimized for TensorFlow, ensuring efficient storage and retrieval. This approach benefits from efficient data format, parallel processing, scalability, and TensorFlow integration. Implementing a combination of TFRecords in Cloud Storage and BigQuery (Option E) can also be effective for scenarios requiring complex queries on the dataset, though it may introduce additional costs. Other options like Cloud Bigtable (Option A) are tailored for NoSQL data, not structured CSV files, and HDFS (Option B), while viable, doesn't offer the same level of integration and management as Cloud Storage for TensorFlow on Google Cloud. BigQuery (Option C) lacks the specialized optimization for TensorFlow workflows.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
You are working on a machine learning project where you need to train a TensorFlow model on a structured dataset containing 100 billion records, stored across multiple CSV files. The project is constrained by tight deadlines and a limited budget, requiring an efficient solution that minimizes costs while maximizing performance. Additionally, the solution must be scalable to accommodate future data growth. Given these constraints, which of the following approaches would BEST improve the input/output execution performance for training your TensorFlow model? Choose the two most effective options.
A
Load the data into Cloud Bigtable, and read the data from Bigtable, leveraging its high throughput and low latency for NoSQL operations.
B
Convert the CSV files into shards of TFRecords, and store the data in the Hadoop Distributed File System (HDFS), utilizing its distributed nature for large datasets.
C
Load the data into BigQuery, and read the data from BigQuery, taking advantage of its serverless architecture and SQL capabilities for structured data.
D
Convert the CSV files into shards of TFRecords, and store the data in Cloud Storage, benefiting from its high durability and integration with TensorFlow.
E
Implement a combination of both converting the CSV files into TFRecords stored in Cloud Storage and utilizing BigQuery for complex queries on the dataset.