
Answer-first summary for fast verification
Answer: Decision trees
## Detailed Explanation ### Requirements Analysis The question specifies two key requirements: 1. **Multi-class classification**: The algorithm must classify genes into 20 distinct categories. 2. **Model interpretability**: The company needs to document how the inner mechanism of the model affects the output, requiring transparency in decision-making. ### Evaluation of Options **A: Decision trees** - **OPTIMAL CHOICE** - **Multi-class capability**: Decision trees naturally handle multi-class classification problems through recursive partitioning of the feature space. - **Interpretability**: Decision trees provide exceptional transparency. The model structure consists of nodes representing feature splits and leaves representing class predictions. Each decision path from root to leaf explicitly shows which features were considered and what thresholds were applied, allowing complete documentation of how input characteristics influence the final classification. - **Feature importance**: Decision trees can quantify feature importance, showing which gene characteristics have the greatest impact on classification decisions. - **Visual representation**: The tree structure can be visualized and explained to stakeholders without requiring deep technical knowledge. **B: Linear regression** - **NOT SUITABLE** - Linear regression is designed for continuous value prediction, not classification tasks. While it can be adapted for classification through techniques like one-hot encoding, it's fundamentally a regression algorithm. - Interpretability exists through coefficient analysis, but this doesn't directly address the multi-class classification requirement. **C: Logistic regression** - **LESS SUITABLE** - While logistic regression can be extended to multi-class problems through methods like one-vs-rest or multinomial logistic regression, it's primarily designed for binary classification. - Interpretability is moderate through coefficient analysis, but the decision boundaries are less transparent than decision trees. The "inner mechanism" is represented by weighted sums and sigmoid transformations, which are less intuitive for documenting how specific gene characteristics directly affect classification. **D: Neural networks** - **NOT SUITABLE** - Neural networks can handle multi-class classification effectively through architectures like softmax output layers. - However, neural networks are typically "black box" models with limited interpretability. While techniques like SHAP values or LIME can provide post-hoc explanations, the inner mechanisms (hidden layer activations, weight matrices) are not inherently transparent or easily documented for stakeholders. - The question specifically requires documenting "how the inner mechanism of the model affects the output," which neural networks cannot provide natively. ### Why Decision Trees Are Optimal Decision trees uniquely satisfy both requirements simultaneously: 1. **Direct multi-class handling**: The algorithm naturally partitions the feature space into regions corresponding to different classes without requiring special adaptations. 2. **Inherent transparency**: Every decision is explicitly represented in the tree structure, making it possible to trace exactly how each gene characteristic contributes to the final classification. 3. **Documentation capability**: The tree can be serialized, visualized, and explained in plain language, meeting the company's need to document the model's inner workings. While other algorithms might achieve comparable classification accuracy, only decision trees provide the necessary combination of multi-class capability and inherent interpretability required by the question's specifications.
Ultimate access to all questions.
Author: LeetQuiz Editorial Team
No comments yet.
Which machine learning algorithm can classify human genes into 20 categories based on their characteristics and also provide an interpretable explanation of how the model's internal mechanisms influence its predictions?
A
Decision trees
B
Linear regression
C
Logistic regression
D
Neural networks