
Ultimate access to all questions.
As the lead data scientist for an international corporation, you are tasked with developing a predictive maintenance solution to preemptively identify server failures across global data centers. The servers generate unlabeled CPU and memory usage data. Given the constraints of limited labeled data and the need for a scalable, cost-effective solution, which of the following approaches would you prioritize as your first step? Choose the two best options.
A
Develop a simple heuristic, such as one based on z-score, to categorize the historical performance data of the machines. Test this heuristic in a live environment to validate its effectiveness.
B
Employ a team of expert analysts to manually review and categorize the historical performance data of the machines. Use this labeled dataset to build a supervised learning model.
C
Implement a time-series model to predict the machines' performance metrics. Set up alerts for significant deviations between actual and predicted values, assuming the data's temporal nature is the most critical feature.
D
Create a simple heuristic, like one based on z-score, to classify the machines' historical performance data. Then, develop a model to detect anomalies based on this classified data, iteratively improving the heuristic and model based on performance insights.
E
Both A and D are viable first steps, with A providing a quick initial assessment and D offering a more structured approach towards building a predictive model.