Explanation
K-means clustering is the most suitable machine learning model for this task because:
- Clustering Purpose: K-means is specifically designed for grouping data points into distinct clusters based on similarity
- Fixed Number of Groups: The requirement to split customers into exactly 20 distinct groups aligns perfectly with K-means, where you specify the number of clusters (k=20)
- Customer Segmentation: This is a classic use case for clustering algorithms in customer analytics
- Scalability: K-means can efficiently handle 2,000 data points
Why other options are less suitable:
- Linear Regression: Used for predicting continuous outcomes, not grouping
- Logistic Regression: Used for binary classification, not multi-group clustering
- Decision Tree: Primarily for classification or regression, though can be used for clustering in some variations
- Support Vector Machine: Mainly for classification, though SVM clustering exists but is less common
- Neural Network: Can be used for clustering but is more complex and computationally intensive than K-means
K-means clustering is the standard and most appropriate choice for this customer segmentation task.