Ultimate access to all questions.
As an ML engineer at a mobile gaming company, you're tasked with deploying a TensorFlow model into a mobile app to enhance user experience by reducing game loading times. The current model's inference latency is 200ms, which is above the production standard of 100ms. The management has approved a slight decrease in accuracy, up to 2%, to achieve the target latency. Given these constraints and without the option to retrain the model, which optimization technique should you prioritize to meet the latency reduction goal? Choose the best option.