
Databricks Certified Data Engineer - Associate
Get started today
Ultimate access to all questions.
When working with a complex struct data type in SQL that contains employee and sales data, how would you calculate the total sales made by all employees? The table is defined with batchId INT
, performance ARRAY<STRUCT>
, and insertDate TIMESTAMP
. Here's a sample of the performance
column data:
[
{ "employeeId": 1234, "sales": 10000 },
{ "employeeId": 3232, "sales": 30000 }
]
Given the following SQL to create the table sales
:
create or replace table sales as
select 1 as batchId,
from_json('[{ "employeeId":1234,"sales" : 10000 },{ "employeeId":3232,"sales" : 30000 }]',
'ARRAY<STRUCT>') as performance,
current_timestamp() as insertDate
union all
select 2 as batchId,
from_json('[{ "employeeId":1235,"sales" : 10500 },{ "employeeId":3233,"sales" : 32000 }]',
'ARRAY<STRUCT>') as performance,
current_timestamp() as insertDate
When working with a complex struct data type in SQL that contains employee and sales data, how would you calculate the total sales made by all employees? The table is defined with batchId INT
, performance ARRAY<STRUCT>
, and insertDate TIMESTAMP
. Here's a sample of the performance
column data:
[
{ "employeeId": 1234, "sales": 10000 },
{ "employeeId": 3232, "sales": 30000 }
]
Given the following SQL to create the table sales
:
create or replace table sales as
select 1 as batchId,
from_json('[{ "employeeId":1234,"sales" : 10000 },{ "employeeId":3232,"sales" : 30000 }]',
'ARRAY<STRUCT>') as performance,
current_timestamp() as insertDate
union all
select 2 as batchId,
from_json('[{ "employeeId":1235,"sales" : 10500 },{ "employeeId":3233,"sales" : 32000 }]',
'ARRAY<STRUCT>') as performance,
current_timestamp() as insertDate
Explanation:
The correct approach is to use the aggregate
function combined with flatten
and collect_list
to sum up all sales values from the performance
column. This method correctly accesses the sales
values within the struct data type using the .
notation (performance.sales
).
Note: Option C is incorrect because it attempts to use SLICE
on the performance
column, which is not applicable here. Additionally, the use of :
(as in performance:sales
) is only appropriate for JSON data, not struct data types. Understanding the distinction between JSON and struct data types is crucial for the exam.
Alternative solutions include using reduce
instead of aggregate
or combining explode
and sum
functions for the same result. For example:
select reduce(flatten(collect_list(performance.sales)), 0, (x, y) -> x + y) as total_sales from sales
Or:
with cte as (
select explode(flatten(collect_list(performance.sales))) sales from sales
)
select sum(sales) from cte