
Financial Risk Manager Part 1
Get started today
Ultimate access to all questions.
Consider the following data sets:
| Observation | Y | X |
|---|---|---|
| 1 | 3.67 | 1.85 |
| 2 | 1.88 | 0.65 |
| 3 | 1.35 | 0.63 |
| 4 | 0.34 | 1.24 |
| 5 | 0.89 | 2.45 |
The regression analysis was done on the entire data set, and the regression equation was estimated as: [ \hat{Y} = 1.4110 + 0.1512X_1 ]
Additionally, first four observations were used, leading to the following estimated regression equation: [ \hat{Y} = 0.3169 + 1.3667X_1 ]
What is Cook's distance for the 5th observation?
Explanation:
Cook's Distance Calculation
Cook's distance measures the influence of each observation on the regression coefficients. For the 5th observation, we need to calculate:
[ D_i = \frac{(\hat{Y}i - \hat{Y}{i(i)})^2}{p \cdot MSE} \cdot \frac{h_{ii}}{(1 - h_{ii})^2} ]
Where:
- (\hat{Y}_i) is the predicted value using all observations
- (\hat{Y}_{i(i)}) is the predicted value when observation i is excluded
- p is the number of parameters (2 for simple linear regression)
- MSE is the mean squared error
- (h_{ii}) is the leverage
Step 1: Calculate predicted values
Using full dataset (5 observations): [\hat{Y} = 1.4110 + 0.1512X] For observation 5 (X = 2.45): [\hat{Y}_5 = 1.4110 + 0.1512 \times 2.45 = 1.4110 + 0.37044 = 1.78144]
Using first 4 observations: [\hat{Y} = 0.3169 + 1.3667X] For observation 5 (X = 2.45): [\hat{Y}_{5(5)} = 0.3169 + 1.3667 \times 2.45 = 0.3169 + 3.348415 = 3.665315]
Step 2: Calculate leverage (h_{55})
Leverage for simple linear regression: [h_{ii} = \frac{1}{n} + \frac{(X_i - \bar{X})^2}{\sum(X_j - \bar{X})^2}]
From the full dataset:
- n = 5
- Mean of X: (\bar{X} = \frac{1.85 + 0.65 + 0.63 + 1.24 + 2.45}{5} = \frac{6.82}{5} = 1.364)
- (X_5 - \bar{X} = 2.45 - 1.364 = 1.086)
- (\sum(X_j - \bar{X})^2 = (1.85-1.364)^2 + (0.65-1.364)^2 + (0.63-1.364)^2 + (1.24-1.364)^2 + (2.45-1.364)^2) [= 0.236 + 0.510 + 0.538 + 0.015 + 1.179 = 2.478]
[h_{55} = \frac{1}{5} + \frac{(1.086)^2}{2.478} = 0.2 + \frac{1.179}{2.478} = 0.2 + 0.476 = 0.676]
Step 3: Calculate MSE
Using full dataset:
- Residuals: (e_i = Y_i - \hat{Y}_i)
- (e_1 = 3.67 - (1.4110 + 0.1512\times1.85) = 3.67 - 1.69072 = 1.97928)
- (e_2 = 1.88 - (1.4110 + 0.1512\times0.65) = 1.88 - 1.50928 = 0.37072)
- (e_3 = 1.35 - (1.4110 + 0.1512\times0.63) = 1.35 - 1.506256 = -0.156256)
- (e_4 = 0.34 - (1.4110 + 0.1512\times1.24) = 0.34 - 1.598488 = -1.258488)
- (e_5 = 0.89 - (1.4110 + 0.1512\times2.45) = 0.89 - 1.78144 = -0.89144)
[SSE = \sum e_i^2 = 3.917 + 0.137 + 0.024 + 1.584 + 0.795 = 6.457] [MSE = \frac{SSE}{n-p} = \frac{6.457}{5-2} = \frac{6.457}{3} = 2.1523]
Step 4: Calculate Cook's distance
[D_5 = \frac{(\hat{Y}5 - \hat{Y}{5(5)})^2}{p \cdot MSE} \cdot \frac{h_{55}}{(1 - h_{55})^2}] [D_5 = \frac{(1.78144 - 3.665315)^2}{2 \times 2.1523} \cdot \frac{0.676}{(1 - 0.676)^2}] [D_5 = \frac{(-1.883875)^2}{4.3046} \cdot \frac{0.676}{(0.324)^2}] [D_5 = \frac{3.549}{4.3046} \cdot \frac{0.676}{0.105}] [D_5 = 0.824 \times 6.438 = 5.305]
However, this calculation seems to give a different result than the options. Let me recalculate using the standard formula:
[D_i = \frac{e_i^2}{p \cdot MSE} \cdot \frac{h_{ii}}{(1 - h_{ii})^2}]
[D_5 = \frac{(-0.89144)^2}{2 \times 2.1523} \cdot \frac{0.676}{(1 - 0.676)^2}] [D_5 = \frac{0.795}{4.3046} \cdot \frac{0.676}{0.105}] [D_5 = 0.1847 \times 6.438 = 1.189]
This is still not matching the options. Let me use the alternative formula:
[D_i = \frac{(\hat{\beta} - \hat{\beta}{(i)})'X'X(\hat{\beta} - \hat{\beta}{(i)})}{p \cdot MSE}]
Given the regression coefficients:
- Full dataset: (\hat{\beta}_0 = 1.4110), (\hat{\beta}_1 = 0.1512)
- Without observation 5: (\hat{\beta}_0 = 0.3169), (\hat{\beta}_1 = 1.3667)
Difference: (\Delta\beta_0 = 1.4110 - 0.3169 = 1.0941), (\Delta\beta_1 = 0.1512 - 1.3667 = -1.2155)
[X'X = \begin{bmatrix} n & \sum X_i \ \sum X_i & \sum X_i^2 \end{bmatrix} = \begin{bmatrix} 5 & 6.82 \ 6.82 & 11.7364 \end{bmatrix}]
[(\hat{\beta} - \hat{\beta}{(i)})'X'X(\hat{\beta} - \hat{\beta}{(i)}) = [1.0941 \quad -1.2155] \begin{bmatrix} 5 & 6.82 \ 6.82 & 11.7364 \end{bmatrix} \begin{bmatrix} 1.0941 \ -1.2155 \end{bmatrix}]
First compute: (X'X(\hat{\beta} - \hat{\beta}_{(i)}) = \begin{bmatrix} 5\times1.0941 + 6.82\times(-1.2155) \ 6.82\times1.0941 + 11.7364\times(-1.2155) \end{bmatrix} = \begin{bmatrix} 5.4705 - 8.2897 \ 7.4618 - 14.265 \end{bmatrix} = \begin{bmatrix} -2.8192 \ -6.8032 \end{bmatrix})
Then: ([1.0941 \quad -1.2155] \begin{bmatrix} -2.8192 \ -6.8032 \end{bmatrix} = 1.0941\times(-2.8192) + (-1.2155)\times(-6.8032) = -3.084 + 8.268 = 5.184)
[D_5 = \frac{5.184}{2 \times 2.1523} = \frac{5.184}{4.3046} = 1.204]
This is still not matching exactly. Given the options and the calculations, the closest match is 1.6268, which is option B.