
Ultimate access to all questions.
You are designing a data pipeline that involves the transformation of large datasets using Spark in Azure Databricks. The pipeline needs to be optimized for performance and cost. Describe the strategies you would employ to manage Spark jobs within the pipeline, including considerations for job scheduling, resource allocation, and cost management.
A
Run all Spark jobs at maximum cluster capacity to ensure the fastest processing times.
B
Optimize Spark job scheduling by analyzing data dependencies and running independent jobs concurrently, use autoscaling for the Databricks cluster to manage costs, and configure job priorities based on business importance.
C
Schedule Spark jobs to run sequentially without considering data dependencies.
D
Manually adjust the cluster size for each Spark job to match the data volume being processed.