LeetQuiz Logo
Privacy Policy•contact@leetquiz.com
© 2025 LeetQuiz All rights reserved.
Databricks Certified Machine Learning - Associate

Databricks Certified Machine Learning - Associate

Get started today

Ultimate access to all questions.


A data scientist is developing a Databricks notebook that requires extensive feature engineering, such as creating new columns and applying transformations. They aim to encapsulate this feature engineering logic into a reusable component. What is the best approach to accomplish this?

Real Exam



Explanation:

The recommended approach for encapsulating feature engineering logic into a reusable component in Databricks is to create a Spark MLlib Transformer class. This is because Spark MLlib Transformers are specifically designed for building reusable components in machine learning pipelines. They offer a structured and standardized way to encapsulate feature engineering logic, including input and output DataFrames, transformation logic, and parameterization. Transformers provide several benefits such as reusability, maintainability, scalability, and facilitate documentation. On the other hand, custom PySpark UDFs are less ideal due to difficulties in integration with Spark MLlib pipelines, inefficiency from serialization and deserialization overheads, and limited flexibility. The map function is unsuitable for complex feature engineering logic as it's primarily for simple one-to-one transformations. Saving and loading intermediate DataFrames is inefficient and creates workflow dependencies. Therefore, creating a Spark MLlib Transformer class is the most effective method for encapsulating feature engineering logic in a Databricks notebook, promoting a modular and maintainable machine learning workflow.

Powered ByGPT-5