
Explanation:
The correct approach is to use tf.data.Dataset, which is specifically designed for handling large datasets that exceed memory capacity. It enables efficient data pipeline creation and supports streaming data processing, making it ideal for large-scale datasets. Options A and C are incorrect as they rely on in-memory data structures, which do not solve the memory issue. Option D, while feasible, introduces unnecessary complexity compared to the straightforward solution provided by tf.data.Dataset. For more details, refer to TensorFlow's documentation on tf.data.Dataset and introductory guides on data input pipelines.
Ultimate access to all questions.
No comments yet.
You are a Machine Learning Engineer at a leading digital publishing platform that hosts a wide range of content from both established authors and emerging voices. Your platform offers users a limited number of free articles each month, after which a subscription is required. To enhance user experience and engagement, you are tasked with developing a machine learning model that analyzes reading habits and preferences to predict future trends and favored topics. During the training phase of your Deep Neural Network (DNN) using TensorFlow, you encounter a significant challenge: the dataset is too large to fit into RAM. Considering the need for a solution that is both efficient and straightforward, which of the following approaches should you adopt? (Choose one correct option)
A
Use a pandas.DataFrame for handling the dataset, leveraging its in-memory data structures.
B
Implement tf.data.Dataset to efficiently manage and stream the dataset, avoiding the need to load it entirely into RAM.
C
Store the dataset in a NumPy array, utilizing its efficient storage for numerical data.
D
Employ a queue with tf.train.shuffle_batch for processing the dataset in batches, introducing additional complexity for shuffling and batching.