
Answer-first summary for fast verification
Answer: Utilize a single wide table to store all IoT data, partitioned by device ID, leveraging Delta Lake‘s optimization for concurrent writes.
1. **Single wide table**: Storing all IoT data in a single wide table simplifies the data model and reduces query complexity, improving performance and manageability. 2. **Partitioned by device ID**: This allows for efficient querying based on specific devices, optimizing performance and reducing data scanned per query. 3. **Delta Lake‘s optimization for concurrent writes**: Ensures data consistency and reliability with high-volume writes through ACID transactions, scalable metadata handling, and data versioning. 4. **Efficient ingestion**: A single wide table partitioned by device ID optimizes ingestion for millions of devices reporting every minute. 5. **Near real-time querying**: This setup maintains the ability to query data in near real-time, enabling quick analysis and timely decisions. Thus, a single wide table partitioned by device ID, leveraging Delta Lake‘s optimizations, is the most suitable approach for a write-heavy IoT workload in a lakehouse architecture.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
How should you model your lakehouse architecture to efficiently handle a high-volume, write-heavy IoT workload with millions of devices reporting every minute, while ensuring the ability to query data in near real-time?
A
Store raw data in a NoSQL database for write efficiency, periodically ETLing processed data into the lakehouse for analytical queries.
B
Implement a sharded approach, creating separate tables for subsets of devices, and use a metastore to track shards for querying.
C
Utilize a single wide table to store all IoT data, partitioned by device ID, leveraging Delta Lake‘s optimization for concurrent writes.
D
Apply a micro-batching technique that combines streaming ingestion with periodic optimization (compaction and indexing) of the stored data for analysis.