
Answer-first summary for fast verification
Answer: Create a UDF in Databricks that applies the conversion formula and use it in a Spark SQL query to transform the temperature column.
Creating a User-Defined Function (UDF) in Databricks is the most efficient and scalable method for converting temperatures from Celsius to Fahrenheit. This approach allows the data engineer to apply the conversion formula directly within Spark SQL queries, leveraging Databricks' integration with PySpark and Spark SQL for efficient data processing. The UDF encapsulates the conversion formula (Fahrenheit = Celsius * 9/5 + 32), enabling custom transformations without the need for external tools or manual data modification. Example code for creating and using the UDF in Databricks: ```python # Define the UDF in Python def celsiusToFahrenheit(celsius): return (celsius * 9.0/5) + 32 # Register the UDF spark.udf.register("celsiusToFahrenheitUDF", celsiusToFahrenheit) # Example usage in Spark SQL spark.sql("SELECT city, celsiusToFahrenheitUDF(temperature) AS temperature_fahrenheit FROM weather_data") ``` This method is preferred for its flexibility and efficiency in processing data within Databricks' analytical workflows.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
A data engineer in Databricks needs to convert temperatures from Celsius to Fahrenheit for a global weather dataset. What is the most efficient method to achieve this within a Databricks notebook?
A
Export the dataset to a CSV file, convert the temperatures using an external tool, and then re-import the dataset into Databricks.
B
Directly modify the dataset in the data lake to convert all temperatures to Fahrenheit before importing it into Databricks.
C
Use a built-in Spark SQL function that automatically converts temperatures from Celsius to Fahrenheit.
D
Create a UDF in Databricks that applies the conversion formula and use it in a Spark SQL query to transform the temperature column.
No comments yet.