
Ultimate access to all questions.
You are working on optimizing the training performance of a TensorFlow model that processes a large dataset stored as a single 5 terabyte CSV file on Cloud Storage. The current input data pipeline is inefficient, leading to prolonged training times. Considering the need for scalability, cost-effectiveness, and compliance with data processing best practices, what initial steps should you take to enhance the pipeline's performance? (Choose two correct options)
A
Enable the reshuffle_each_iteration parameter in the tf.data.Dataset.shuffle method to improve data shuffling efficiency.
B
Convert the input CSV file into a TFRecord file format to leverage TensorFlow's optimized data format for faster read times and better compression.
C
Use a randomly selected 10 gigabyte subset of the data for training your model to reduce the dataset size and training time.
D
Divide the dataset into multiple CSV files and apply a parallel interleave transformation to increase data loading parallelism.
E
Implement both converting the CSV file into TFRecord format and dividing the dataset into multiple files for parallel processing to maximize efficiency.