Google Professional Machine Learning Engineer

Ultimate access to all questions.

In the context of preparing a dataset for machine learning, a data engineer is tasked with improving the dataset's quality and efficiency. Among the various preprocessing steps, identifying and removing duplicate data is considered crucial. Considering the constraints of storage costs, computational efficiency, and the accuracy of machine learning models, which of the following best describes the significance of removing duplicate data? Choose the two most correct options.

Real Exam

Scaling data to a common range to ensure uniformity across features

10.7%

Removing duplicate data entries to enhance storage efficiency and ensure the dataset's integrity

Loading comments...