Google Professional Machine Learning Engineer

Google Professional Machine Learning Engineer

Get started today

Ultimate access to all questions.


You are a data scientist working on a project that involves exploratory data analysis (EDA) on a dataset that is expected to grow beyond 100 TB. The project requires interactive querying capabilities to quickly iterate on hypotheses and visualize data insights. The solution must be cost-effective, scalable, and support standard SQL for ease of use. Which Google Cloud service is the BEST choice for this scenario? Choose one correct option.




Explanation:

Correct Option: C. BigQuery

BigQuery is the optimal choice for this scenario due to its serverless architecture, which allows for interactive SQL queries across vast datasets without the need for infrastructure management. It is specifically designed for large-scale data analysis, offering:

  • Scalability: Capable of handling datasets that grow beyond 100 TB seamlessly.
  • Cost-effectiveness: Pay-as-you-go pricing model that charges only for the data processed by queries.
  • Interactive analysis: Delivers fast results for complex queries, facilitating rapid hypothesis testing.
  • SQL support: Offers comprehensive SQL support for data manipulation and analysis, making it accessible to users familiar with SQL.
  • Integration with visualization tools: Easily connects with tools like Data Studio or Looker for enhanced data exploration and visualization.

Why other options are less suitable:

  • A. Cloud Spanner: While scalable and consistent, it is not optimized for the interactive analysis of large datasets typical in EDA.
  • B. Cloud Storage: Primarily for storage, it does not provide the interactive querying capabilities required for EDA.
  • D. Cloud Functions: Designed for event-driven computing tasks, not for the large-scale data analysis and interactive querying needed in this scenario.