Google Professional Machine Learning Engineer

Get started today

Ultimate access to all questions.

Explanation:

The question describes a scenario with a native synchronous implementation where training data is split across multiple files, and the goal is to reduce input pipeline execution time. Option D (Add parallel interleave to the pipeline) is the correct choice because parallel interleave allows multiple files to be read and processed concurrently, overlapping I/O and preprocessing operations. This directly addresses the bottleneck in synchronous implementations by enabling parallelism when data is distributed across multiple files. The community discussion strongly supports this with 100% consensus on D, citing TensorFlow documentation and the need for parallelization when data is split into multiple files. Other options are less suitable: A (Increase CPU load) doesn't address the synchronous bottleneck, B (Add caching) helps with repeated data reads but not initial pipeline speed for multiple files, and C (Increase network bandwidth) is irrelevant as the issue is local pipeline synchronization, not network throughput.

Explanation:

Comments (0)

No comments yet.

You have a native synchronous implementation for model training and observe GPU utilization issues. Your training data is split across multiple files. To reduce input pipeline execution time, what should you do?

Exam-Like

Increase the CPU load

5.3%

Add caching to the pipeline

10.5%

Increase the network bandwidth

5.8%

Add parallel interleave to the pipeline

78.4%