
Ultimate access to all questions.
In the context of preparing datasets for machine learning models, data duplication is a critical issue that needs to be addressed. Considering a scenario where a dataset contains multiple identical or near-identical records due to data entry errors, integration from multiple sources, or during data migration processes, which of the following best describes the importance of addressing data duplication? Choose the two most correct options.
A
Scaling data to a common range to ensure uniformity across features
B
Removing duplicate data entries to enhance the accuracy and reliability of machine learning models
C
Transforming data into a different format to meet the requirements of specific algorithms
D
Encrypting data to protect sensitive information from unauthorized access
E
Improving storage efficiency and reducing costs by eliminating unnecessary data redundancy