
Answer-first summary for fast verification
Answer: Dynamic range quantization: Reduces the precision of the model's weights from floating point to integer, significantly decreasing latency with minimal accuracy loss, without the need for retraining.
Dynamic range quantization is the most suitable option as it directly addresses the need to reduce inference latency without retraining the model. By converting the model's weights from floating-point to integer precision, it achieves a significant reduction in latency with only a minimal impact on accuracy, aligning perfectly with the given constraints. This technique is well-documented in TensorFlow's resources for optimizing model performance on mobile devices.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
As an ML engineer at a mobile gaming company, you're tasked with deploying a TensorFlow model into a mobile app to enhance user experience by reducing game loading times. The current model's inference latency is 200ms, which is above the production standard of 100ms. The management has approved a slight decrease in accuracy, up to 2%, to achieve the target latency. Given these constraints and without the option to retrain the model, which optimization technique should you prioritize to meet the latency reduction goal? Choose the best option.
A
Model distillation: Training a smaller, faster model to mimic the behavior of the original model, which requires retraining and thus is not applicable here.
B
Dynamic range quantization: Reduces the precision of the model's weights from floating point to integer, significantly decreasing latency with minimal accuracy loss, without the need for retraining.
C
Weight pruning: Eliminates unnecessary weights in the model to reduce size and latency, but may require fine-tuning to maintain accuracy, which involves retraining.
D
Dimensionality reduction: Reduces the number of features in the input data, which is not applicable as it doesn't directly optimize the model's inference latency without retraining.