LeetQuiz Logo
Privacy Policy•contact@leetquiz.com
© 2025 LeetQuiz All rights reserved.
Google Professional Machine Learning Engineer

Google Professional Machine Learning Engineer

Get started today

Ultimate access to all questions.


As an ML engineer at a mobile gaming company, you're tasked with deploying a TensorFlow model into a mobile app to enhance user experience by reducing game loading times. The current model's inference latency is 200ms, which is above the production standard of 100ms. The management has approved a slight decrease in accuracy, up to 2%, to achieve the target latency. Given these constraints and without the option to retrain the model, which optimization technique should you prioritize to meet the latency reduction goal? Choose the best option.

Real Exam



Explanation:

Dynamic range quantization is the most suitable option as it directly addresses the need to reduce inference latency without retraining the model. By converting the model's weights from floating-point to integer precision, it achieves a significant reduction in latency with only a minimal impact on accuracy, aligning perfectly with the given constraints. This technique is well-documented in TensorFlow's resources for optimizing model performance on mobile devices.

Powered ByGPT-5