
Ultimate access to all questions.
You are a data engineer responsible for deploying a critical batch job in a production environment on Azure Databricks. The job processes large volumes of data nightly and must complete within a strict 6-hour window to meet business SLAs. Given the importance of the job, you need to ensure its success and minimize the risk of failures, especially those related to node failures. Which of the following strategies would BEST ensure the job's success by providing comprehensive monitoring and proactive failure detection, while also considering cost efficiency and scalability? Choose one option.
A
Deploy the job without any monitoring or logging, relying solely on manual checks the next morning to identify any failures, as this approach minimizes Azure costs.
B
Implement basic logging to capture errors but skip comprehensive monitoring to save on Azure Monitor costs, assuming that node failures are rare and can be addressed manually if they occur.
C
Deploy the job with comprehensive monitoring and logging, including performance metrics, system health, and error logs, and configure Azure Monitor alerts to notify you of potential node failures in real-time, ensuring quick response to issues.
D
Only monitor the job's completion time and ignore other metrics, as this is the only SLA requirement, to reduce complexity and monitoring costs.