Ultimate access to all questions.
In the context of building a scalable and cost-effective machine learning pipeline for a multinational corporation, which of the following scenarios best illustrates the advantage of using a data lake for storing large datasets? Choose the two most correct options.
Explanation:
A data lake is particularly advantageous for scenarios involving the storage of large volumes of raw, unstructured data (Option B) and for minimizing storage costs while maintaining the flexibility to handle various data types over time (Option D). These capabilities are essential for scalable and cost-effective machine learning pipelines. Options A, C, and E describe scenarios better suited for other storage solutions, such as transactional databases or data warehouses, which are optimized for real-time processing, quick retrieval of structured data, and high-speed access, respectively.