
Ultimate access to all questions.
You are tasked with designing a real-time data processing pipeline for a financial services company that requires processing high volumes of transaction data with minimal latency. The solution must ensure data integrity, support scalability to handle peak loads, and comply with financial regulatory standards. Considering these requirements, which of the following approaches would BEST meet the company's needs? Choose one option.
A
Implement a batch processing system using Databricks notebooks and scheduled jobs to process transactions in predefined intervals, ensuring compliance through manual audits.
B
Deploy Apache Kafka for high-throughput data ingestion, utilize Databricks Structured Streaming for real-time processing with built-in fault tolerance, and store the results in a Delta Lake for ACID transactions and regulatory compliance.
C
Develop a custom Spark Streaming application for data ingestion and processing, and use a NoSQL database for storage to achieve low latency, relying on external tools for compliance monitoring.
D
Adopt a third-party cloud-based real-time analytics platform for end-to-end data processing and storage, using Databricks for ad-hoc analysis and reporting, with compliance ensured by the platform's certifications.