
Answer-first summary for fast verification
Answer: An automated workflow needs to be run every 30 minutes., A Databricks SQL query needs to be scheduled for upward reporting.
## Explanation **Job clusters** are designed for automated, scheduled workloads, while **all-purpose clusters** are intended for interactive development and ad-hoc analysis. Let's analyze each option: **A. An ad-hoc analytics report needs to be developed while minimizing compute costs.** - ❌ **Incorrect** - Ad-hoc analytics reports are interactive workloads best suited for all-purpose clusters. While minimizing costs is important, job clusters aren't designed for interactive development. **B. A data team needs to collaborate on the development of a machine learning model.** - ❌ **Incorrect** - Collaborative development requires interactive sessions where multiple users can work together, which is the purpose of all-purpose clusters. **C. An automated workflow needs to be run every 30 minutes.** - ✅ **Correct** - Job clusters are specifically designed for automated, scheduled workflows. They start when a job runs and terminate when the job completes, making them cost-effective for scheduled tasks. **D. A Databricks SQL query needs to be scheduled for upward reporting.** - ✅ **Correct** - Scheduled SQL queries for reporting are automated workflows that should use job clusters. This is a common use case for job clusters in production environments. **E. A data engineer needs to manually investigate a production error.** - ❌ **Incorrect** - Manual investigation requires interactive exploration and debugging, which is best done with an all-purpose cluster. ### Key Differences: - **Job Clusters**: Created when a job starts, terminated when the job completes. Best for automated, scheduled, production workloads. - **All-Purpose Clusters**: Long-running clusters for interactive development, collaboration, and ad-hoc analysis. ### Best Practices: 1. Use **job clusters** for production pipelines, scheduled reports, and automated workflows 2. Use **all-purpose clusters** for development, testing, debugging, and collaborative work 3. Job clusters are more cost-effective for scheduled tasks because they only run when needed 4. All-purpose clusters provide persistent environments for ongoing work
Author: Keng Suppaseth
Ultimate access to all questions.
No comments yet.
Which of the following describes a scenario in which a data engineer will want to use a Job cluster instead of an all-purpose cluster?
A
An ad-hoc analytics report needs to be developed while minimizing compute costs.
B
A data team needs to collaborate on the development of a machine learning model.
C
An automated workflow needs to be run every 30 minutes.
D
A Databricks SQL query needs to be scheduled for upward reporting.
E
A data engineer needs to manually investigate a production error.