
Ultimate access to all questions.
In the context of preparing a dataset for machine learning, a data engineer is tasked with improving the dataset's quality and efficiency. Among the various preprocessing steps, identifying and removing duplicate data is considered crucial. Considering the constraints of storage costs, computational efficiency, and the accuracy of machine learning models, which of the following best describes the significance of removing duplicate data? Choose the two most correct options.
A
Scaling data to a common range to ensure uniformity across features
B
Removing duplicate data entries to enhance storage efficiency and ensure the dataset's integrity
C
Transforming data into a different format to facilitate specific machine learning algorithms
D
Encrypting data to protect sensitive information from unauthorized access
E
Both B and C are correct