
Explanation:
Apache Hadoop (A) is a batch processing framework and is not designed for real-time data processing. AWS Lambda (C) is a serverless compute service that can handle real-time data, but it has limitations in terms of processing large volumes of data. SQL Server Integration Services (D) is an ETL tool for SQL Server and is not designed for distributed real-time data processing. Apache Spark (B) is a fast and general-purpose distributed computing framework that can handle large volumes of data in real-time, making it the most suitable choice for this task.
Ultimate access to all questions.
No comments yet.
You are tasked with designing a data pipeline that needs to handle a large volume of data in real-time. Which distributed computing framework would be most suitable for this task, and why?
A
Apache Hadoop
B
Apache Spark
C
AWS Lambda
D
SQL Server Integration Services