Databricks Certified Data Engineer - Associate

Get started today

Ultimate access to all questions.

Deep dive into the quiz with AI chat providers.

We prepare a focused prompt with your quiz and certificate details so each AI can offer a more tailored, in-depth explanation.

Question 16

A table customerLocations exists with the following schema:

id STRING,
date STRING,
city STRING,
country STRING

id STRING,
date STRING,
city STRING,
country STRING

A senior data engineer wants to create a new table from this table using the following command:

CREATE TABLE customersPerCountry AS
SELECT country,
       COUNT(*) AS customers
FROM customerLocations
GROUP BY country;

CREATE TABLE customersPerCountry AS
SELECT country,
       COUNT(*) AS customers
FROM customerLocations
GROUP BY country;

A junior data engineer asks why the schema is not being declared for the new table.

Which of the following responses explains why declaring the schema is not necessary?

Real Exam

Community

LLeetQuiz

CREATE TABLE AS SELECT statements adopt schema details from the source table and query.

CREATE TABLE AS SELECT statements infer the schema by scanning the data.

Explanation:

Explanation

In Databricks and Spark SQL, the CREATE TABLE AS SELECT (CTAS) statement automatically adopts the schema from the source table and the query results. Here's why:

Option A is correct: CTAS statements inherit the schema definition from the SELECT query's result set. The new table customersPerCountry will have:
- country column with STRING type (inherited from the source table)
- customers column with BIGINT type (result of COUNT(*))
Option B is incorrect: Schema inference by scanning data is not how CTAS works. The schema is determined at query compilation time, not by scanning actual data.
Option C is incorrect: Schemas are not optional in Databricks tables - all tables have defined schemas.
Option D is incorrect: CTAS does not default all columns to STRING type. It preserves the actual data types from the query result.
Option E is incorrect: All tables in Databricks support schemas and have defined schemas.

The CTAS statement automatically determines the schema based on:

Column names and types from the source table (country from customerLocations)
Aggregation functions and their result types (COUNT(*) returns BIGINT)
Expression results and their inferred types

This eliminates the need for manual schema declaration when creating tables from existing data.

Powered ByGPT-5.2

Comments

Loading comments...