Databricks Certified Data Engineer - Professional

Databricks Certified Data Engineer - Professional

Get started today

Ultimate access to all questions.


For integrating real-time data into a lakehouse, which method best reduces latency and maintains data freshness for analytics?




Explanation:

Streaming data directly into the lakehouse using a service like Apache Kafka is the most effective approach for real-time data integration. This method ensures minimal latency and the highest data freshness for downstream analytics by immediately ingesting data as it's generated. Apache Kafka's scalability, flexibility, and fault tolerance further enhance its suitability for this purpose, making it the optimal choice for maintaining up-to-date and reliable data within the lakehouse environment.