Reddit

A junior data engineer is working with Databricks notebook language interoperability. The goal is to create a view showing all sales from African countries listed in the geo_lookup table. The current database contains only two tables: geo_lookup and sales.

The following code is executed:

%python
countries_af = [x[0] for x in 
spark.table("geo_lookup").filter("continent='AF'").select("country").collect()]

%python
countries_af = [x[0] for x in 
spark.table("geo_lookup").filter("continent='AF'").select("country").collect()]

%sql
CREATE VIEW sales_af AS
SELECT *
FROM sales
WHERE country IN (countries_af)
AND continent = 'AF'

%sql
CREATE VIEW sales_af AS
SELECT *
FROM sales
WHERE country IN (countries_af)
AND continent = 'AF'

What will be the result of executing these command cells sequentially in an interactive notebook?*

Exam-Like

Both commands will succeed. Executing SHOW TABLES will show that countries_af and sales_af have been registered as views.

7.4%

Cmd 1 will succeed. Cmd 2 will search all accessible databases for a table or view named countries_af: if this entity exists, Cmd 2 will succeed._

11.6%

Cmd 1 will succeed and Cmd 2 will fail. countries_af will be a Python variable representing a PySpark DataFrame._

21.5%

Cmd 1 will succeed and Cmd 2 will fail. countries_af will be a Python variable containing a list of strings._

59.5%

Databricks Certified Data Engineer - Professional

Get started today

Comments