
Answer-first summary for fast verification
Answer: Build the environment in Azure Databricks and use Azure Data Factory for orchestration.
The correct answer is B because Azure Databricks provides a unified platform that supports both Python and Scala, enables building automated data pipelines with data storage, movement, and processing, supports workload isolation and interactive workloads through its workspace and cluster management, and allows scaling across clusters. Azure Data Factory is the optimal orchestration tool as it is specifically designed for data pipeline orchestration, integrates seamlessly with Databricks, and can handle both data engineering and data science workflows. Option A is less suitable as Apache Hive for HDInsight is not ideal for data science workloads and lacks the interactive capabilities of Databricks. Options C and D are incorrect because Azure Container Instances (ACI) is not designed for data pipeline orchestration; it is primarily for container deployment and lacks native orchestration features like scheduling, monitoring, and integration with data services, as highlighted in the community discussion (e.g., 'you cant do orchestration with ACI, only with data factory').
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
You need to create a data engineering and data science development environment that supports the following requirements:
What should you do?
A
Build the environment in Apache Hive for HDInsight and use Azure Data Factory for orchestration.
B
Build the environment in Azure Databricks and use Azure Data Factory for orchestration.
C
Build the environment in Apache Spark for HDInsight and use Azure Container Instances for orchestration.
D
Build the environment in Azure Databricks and use Azure Container Instances for orchestration.
No comments yet.