Google Professional Machine Learning Engineer

Get started today

Ultimate access to all questions.

Explanation:

A data lake is particularly advantageous for scenarios involving the storage of large volumes of raw, unstructured data (Option B) and for minimizing storage costs while maintaining the flexibility to handle various data types over time (Option D). These capabilities are essential for scalable and cost-effective machine learning pipelines. Options A, C, and E describe scenarios better suited for other storage solutions, such as transactional databases or data warehouses, which are optimized for real-time processing, quick retrieval of structured data, and high-speed access, respectively.

Explanation:

Comments (0)

No comments yet.

In the context of building a scalable and cost-effective machine learning pipeline for a multinational corporation, which of the following scenarios best illustrates the advantage of using a data lake for storing large datasets? Choose the two most correct options.

Real Exam

The need to process and analyze real-time transactional data from global sales with millisecond latency.

9.0%

The requirement to store petabytes of raw, unstructured social media data for future sentiment analysis projects.

32.1%

The necessity to quickly retrieve small subsets of structured data for daily operational reports.

5.1%

The objective to minimize storage costs while retaining the flexibility to store and analyze both structured and unstructured data over long periods.

47.4%

The strategy to leverage high-speed data retrieval for interactive dashboard visualizations.

6.4%