
Answer-first summary for fast verification
Answer: Encode the images into TFRecords format, store them in Cloud Storage, and use the tf.data API to stream the data during training, ensuring efficient data retrieval and minimal memory usage.
Encoding images into TFRecords and storing them in Cloud Storage is recommended by Google for large datasets that cannot fit into memory. TFRecords provide a compact binary format that is efficient for storage and retrieval. The tf.data API allows for efficient streaming of data during training, reducing memory usage and enabling scalability. This approach also facilitates rapid processing across different machines by leveraging Cloud Storage.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
You are designing an input pipeline for a machine learning model that processes high-resolution images from multiple sources at high speed. The dataset is too large to fit into memory, and you need to ensure the pipeline is scalable, cost-effective, and complies with Google's recommended practices for handling large datasets. Which of the following approaches should you implement? (Choose one correct option)
A
Transform the images into tf.Tensor objects and use tf.data.Dataset.from_tensors() to create the dataset, ensuring all data is loaded into memory at once for quick access.
B
Apply the tf.data.Dataset.prefetch transformation to the dataset after loading all images into memory, aiming to improve data loading efficiency by overlapping the preprocessing and model execution.
C
Convert the images into tf.Tensor objects and utilize Dataset.from_tensor_slices() to create the dataset, assuming the slices will manage memory usage effectively.
D
Encode the images into TFRecords format, store them in Cloud Storage, and use the tf.data API to stream the data during training, ensuring efficient data retrieval and minimal memory usage.