
Answer-first summary for fast verification
Answer: `CREATE TABLE my_table (id STRING, value STRING);`
## Explanation In Databricks, a **managed table** is one where Spark manages both the data and metadata. When you create a table without specifying a location using `CREATE TABLE`, it becomes a managed table by default, and the data is stored in the Databricks Filesystem (DBFS) under the default location. Let's analyze each option: **A.** `CREATE TABLE my_table (id STRING, value STRING) USING org.apache.spark.sql.parquet OPTIONS (PATH "storage-path");` - This creates an **external table** because it specifies a custom storage path. Spark manages only the metadata, not the data. **B.** `CREATE MANAGED TABLE my_table (id STRING, value STRING) USING org.apache.spark.sql.parquet OPTIONS (PATH "storage-path");` - This syntax is incorrect. `CREATE MANAGED TABLE` is not valid Spark SQL syntax. The correct way to create a managed table is simply `CREATE TABLE` without specifying a location. **C.** `CREATE MANAGED TABLE my_table (id STRING, value STRING);` - This syntax is invalid. `CREATE MANAGED TABLE` is not a valid SQL command in Spark. **D.** `CREATE TABLE my_table (id STRING, value STRING) USING DBFS;` - This syntax is invalid. `USING DBFS` is not a valid format specification in Spark SQL. **E.** `CREATE TABLE my_table (id STRING, value STRING);` - ✅ **CORRECT**: This creates a managed table where Spark manages both data and metadata. The data is automatically stored in DBFS at the default location (`dbfs:/user/hive/warehouse/` or similar). **Key Points:** - Managed tables = Spark controls both data and metadata - External tables = Spark controls only metadata, user controls data location - Default format for managed tables in Databricks is Delta Lake (unless configured otherwise) - The simple `CREATE TABLE` syntax without location specification creates a managed table stored in DBFS
Author: Keng Suppaseth
Ultimate access to all questions.
No comments yet.
A junior data engineer needs to create a Spark SQL table my_table for which Spark manages both the data and the metadata. The metadata and data should also be stored in the Databricks Filesystem (DBFS).
Which of the following commands should a senior data engineer share with the junior data engineer to complete this task?
A
CREATE TABLE my_table (id STRING, value STRING) USING org.apache.spark.sql.parquet OPTIONS (PATH "storage-path");
B
CREATE MANAGED TABLE my_table (id STRING, value STRING) USING org.apache.spark.sql.parquet OPTIONS (PATH "storage-path");
C
CREATE MANAGED TABLE my_table (id STRING, value STRING);
D
CREATE TABLE my_table (id STRING, value STRING) USING DBFS;
E
CREATE TABLE my_table (id STRING, value STRING);