Explanation
In stream processing pipelines, sources are systems or feeds that continuously generate data in real-time. Let's analyze each option:
A. change data capture (CDC) feed - ✅ CORRECT
- CDC feeds capture database changes (inserts, updates, deletes) in real-time
- They provide a continuous stream of data changes
- Commonly used as sources for streaming pipelines
B. Kafka - ✅ CORRECT
- Kafka is a distributed streaming platform
- It acts as a message broker that can continuously stream data
- Widely used as a source for stream processing pipelines
C. Delta Lake - ❌ INCORRECT
- Delta Lake is a storage layer, not a streaming source
- While it supports streaming reads, it's primarily a data lake storage format
- Better classified as a sink or storage destination
D. IoT devices - ❌ INCORRECT
- IoT devices generate streaming data, but they are not typically used directly as sources
- IoT data usually flows through message brokers (like Kafka) or IoT platforms first
- The devices themselves are data producers, not pipeline sources
Key Points:
- Stream processing sources must provide continuous, real-time data
- CDC feeds and Kafka are established streaming data sources
- Delta Lake is storage, and IoT devices are data producers that need intermediate systems