
Answer-first summary for fast verification
Answer: Convert the CSV files into shards of TFRecords, and store the data in Cloud Storage.
The correct answer is C. Converting the CSV files into shards of TFRecords and storing the data in Cloud Storage will optimize input/output performance. TFRecords is a TensorFlow-specific binary format that is optimized for performance. Sharding allows for parallel data reading, which further improves performance. Cloud Storage is highly scalable and provides high throughput and low latency, making it suitable for large-scale machine learning projects. Other options like BigQuery and Bigtable add additional layers of abstraction or are not as optimized for TensorFlow's I/O performance.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
You are working on a large-scale machine learning project where you are using TensorFlow to train a model on a structured dataset containing 100 billion records. These records are currently stored in multiple CSV files. To optimize the input/output execution performance and ensure efficient data processing and training, what should you do?
A
Load the data into BigQuery, and read the data from BigQuery.
B
Load the data into Cloud Bigtable, and read the data from Bigtable.
C
Convert the CSV files into shards of TFRecords, and store the data in Cloud Storage.
D
Convert the CSV files into shards of TFRecords, and store the data in the Hadoop Distributed File System (HDFS).