
Ultimate access to all questions.
As a junior Data Scientist in a consulting company, your first task involves preparing datasets for machine learning models. The datasets are stored in various file formats and require cleaning and correction. The solution must be cost-effective, scalable, and require minimal coding. Which Google Cloud Platform (GCP) service is most suitable for this task? Choose the best option.
A
Dataproc: A managed service for running Apache Spark and Apache Hadoop clusters, suitable for large-scale data processing but requires coding and is not specifically designed for data preparation.
B
BigQuery: A serverless, highly scalable data warehouse that requires SQL knowledge for data preprocessing, making it less straightforward for data cleaning tasks without additional tools.
C
Cloud Composer: A workflow orchestration service that manages workflows across clouds and on-premises data centers, not directly used for data preparation.
D
Dataprep: A serverless service designed for exploring, cleaning, and preparing both structured and unstructured data for machine learning without the need for coding, offering a user-friendly interface and scalability.