
Ultimate access to all questions.
You are working with a media company that needs to process and analyze large volumes of video and audio content for content recommendation systems. The data is highly unstructured and requires real-time processing. Describe how you would set up a data processing pipeline to handle this task, including the technologies you would use and how you would ensure the pipeline is scalable and efficient.
A
Use a batch processing approach with SQL databases and ignore real-time requirements.
B
Leverage Apache Kafka for real-time data ingestion, use AWS Lambda for real-time processing, and Amazon Elasticsearch Service for content analysis.
C
Store all data in a single database and process it using scheduled batch jobs.
D
Manually process each data source separately without integrating them.