Explanation
This scenario describes data snooping, which occurs when:
Key Characteristics:
- Multiple strategies/models are tested
- Selection is based purely on statistical significance (lowest p-value)
- No consideration of economic rationale or prior theory
- High risk of finding spurious relationships due to multiple testing
Why this is data snooping:
- Multiple testing problem: Testing 5 factors creates multiple opportunities to find statistically significant results by chance
- P-hacking: Selecting based on lowest p-value without theoretical justification
- Overfitting: The strategy may work well in-sample but fail out-of-sample
Why not other biases:
- Survivorship bias: Not applicable - this involves using only surviving entities in analysis
- Look-ahead bias: Not applicable - the analyst used point-in-time data, which avoids this bias
Best Practice:
- Use economic theory to guide factor selection
- Apply out-of-sample testing
- Use proper multiple testing corrections
- Consider economic significance, not just statistical significance
Conclusion: This approach represents classic data snooping behavior.