
Ultimate access to all questions.
In the context of preparing a machine learning model for a financial services company, you are tasked with selecting the most appropriate statistical concept to understand the probability distribution of customer transaction data. The dataset is large and skewed, with the goal of improving fraud detection. Which of the following concepts is crucial for this task? Choose the best option.
A
Principal component analysis, as it reduces the dimensionality of the dataset, making it easier to visualize and understand the distribution.
B
Central limit theorem, because it allows the approximation of the distribution of sample means to a normal distribution, facilitating inference about the population from sample data.
C
Linear regression, for modeling the relationship between transaction amounts and the likelihood of fraud.
D
K-means clustering, to group similar transactions together based on their features.