Databricks Certified Data Engineer - Associate

Get started today

Ultimate access to all questions.

Deep dive into the quiz with AI chat providers.

We prepare a focused prompt with your quiz and certificate details so each AI can offer a more tailored, in-depth explanation.

A table customerLocations exists with the following schema:

id STRING,
date STRING,
city STRING,
country STRING

id STRING,
date STRING,
city STRING,
country STRING

A senior data engineer wants to create a new table from this table using the following command:

CREATE TABLE customersPerCountry AS
SELECT country,
       COUNT(*) AS customers
FROM customerLocations
GROUP BY country;

CREATE TABLE customersPerCountry AS
SELECT country,
       COUNT(*) AS customers
FROM customerLocations
GROUP BY country;

A junior data engineer asks why the schema is not being declared for the new table.

Which of the following responses explains why declaring the schema is not necessary?

Exam-Like

Community

KKeng

Last updated: January 13, 2026 at 09:02

CREATE TABLE AS SELECT statements adopt schema details from the source table and query.

CREATE TABLE AS SELECT statements infer the schema by scanning the data.

Explanation:

Explanation

Correct Answer: A

In Databricks and Spark SQL, when using CREATE TABLE AS SELECT (CTAS) statements:

Schema Inference: The schema for the new table is automatically inferred from the result set of the SELECT query, not by scanning the actual data (which would be Option B).
Schema Adoption: The new table adopts the column names, data types, and other schema details from the SELECT statement's result columns. In this case:
- country column inherits the STRING type from the source table's country column
- customers column gets the BIGINT type because COUNT(*) returns a BIGINT
Why Other Options Are Incorrect:
- Option B: While schema inference happens, it's not by "scanning the data" but by analyzing the query's result schema.
- Option C: Schemas are not optional; every table has a schema in Databricks.
- Option D: Columns do not default to STRING; they inherit appropriate types from the query.
- Option E: All tables in Databricks support schemas; this is a fundamental feature.

Key Concept: CTAS statements in Databricks automatically derive the schema from the SELECT query's result structure, eliminating the need for explicit schema declaration when creating tables from existing data.

Powered ByGPT-5.2

Comments

Loading comments...