Ultimate access to all questions.
In your role as a Machine Learning Engineer at a biotech startup, your team is pioneering the development of deep learning models inspired by biological organisms. This innovative approach requires the creation of custom TensorFlow operations in C++ and the training of models on datasets with exceptionally large batch sizes. Given that each example in your dataset is approximately 1MB in size, the average network size (including weights and embeddings) is 20GB, and a typical batch size is 1024 examples, which hardware setup would best support your models while considering cost-efficiency and scalability for future model expansions? Choose the best option.