Databricks Certified Data Engineer - Associate

Databricks Certified Data Engineer - Associate

Get started today

Ultimate access to all questions.


In a scenario where you are working with a Delta Lake table named 'sales_data' that contains multiple columns including 'product_id' and 'quantity', you are tasked with creating a view named 'product_sales' that will be used by the analytics team for reporting purposes. The view should only include the 'product_id' and 'quantity' columns to simplify the data model for the end users. Considering the need for cost efficiency and compliance with data governance policies, which of the following Spark SQL queries would you use to create this view? Choose the single best option.




Explanation:

The correct answer is A, as it creates a permanent view named 'product_sales' that references the 'sales_data' table and includes only the 'product_id' and 'quantity' columns. This approach is cost-efficient because it does not store data physically like a table (option B), and it complies with data governance policies by providing a simplified data model for reporting. Options C and D create temporary views, which are not suitable for reporting purposes as they are session-scoped and do not persist after the session ends.