
Google Professional Machine Learning Engineer
Get started today
Ultimate access to all questions.
You are tasked with developing an input pipeline for a machine learning model that processes a large volume of images from various sources, requiring low latency. Upon initial analysis, you realize that the input data exceeds the memory capacity of your current setup. Considering Google's best practices for handling large datasets that do not fit in memory, which of the following approaches should you implement to optimize the dataset creation and training process? Choose the best option.
You are tasked with developing an input pipeline for a machine learning model that processes a large volume of images from various sources, requiring low latency. Upon initial analysis, you realize that the input data exceeds the memory capacity of your current setup. Considering Google's best practices for handling large datasets that do not fit in memory, which of the following approaches should you implement to optimize the dataset creation and training process? Choose the best option.
Explanation:
The recommended approach involves converting the images into TFRecords and storing them in Cloud Storage, followed by using the tf.data API for reading during training. This method is aligned with Google's best practices for handling large datasets that do not fit in memory, as it optimizes read and write throughput. Storing data in large container formats like TFRecord files on Cloud Storage is particularly recommended for unstructured data such as images. This approach ensures scalability and efficiency in processing large volumes of data. For more details, refer to Google's best practices documentation: ML on GCP Best Practices.