
Ultimate access to all questions.
You work for a biotech startup that focuses on developing cutting-edge deep learning models based on the properties of biological organisms. Your team often engages in early-stage experimental phases with novel ML model architectures and frequently writes custom TensorFlow operations in C++. The models are trained on extensive datasets with substantial batch sizes, where a typical batch contains 1024 examples and each example is approximately 1 MB in size. The average size of an entire network, including all weights and embeddings, is around 20 GB. Considering these requirements, which hardware configuration would be the most suitable for your models?
A
A cluster with 2 n1-highcpu-64 machines, each with 8 NVIDIA Tesla V100 GPUs (128 GB GPU memory in total), and a n1-highcpu-64 machine with 64 vCPUs and 58 GB RAM
B
A cluster with 2 a2-megagpu-16g machines, each with 16 NVIDIA Tesla A100 GPUs (640 GB GPU memory in total), 96 vCPUs, and 1.4 TB RAM
C
A cluster with an n1-highcpu-64 machine with a v2-8 TPU and 64 GB RAM
D
A cluster with 4 n1-highcpu-96 machines, each with 96 vCPUs and 86 GB RAM