
Ultimate access to all questions.
A data engineer is attempting to drop a Spark SQL table my_table and runs the following command: DROP TABLE IF EXISTS my_table; After running this command, the engineer notices that the data files and metadata files have been deleted from the file system. Which of the following describes why all of these files were deleted?
A
The table was managed
B
The table's data was smaller than 10 GB
C
The table's data was larger than 10 GB
D
The table was external
E
The table did not have a location
Explanation:
Explanation:
In Databricks/Spark SQL, there are two types of tables:
Managed Tables: These tables are fully managed by the metastore. When you create a managed table, both the metadata (table schema, properties) and the actual data files are managed by the system. When you drop a managed table using DROP TABLE, both the metadata AND the underlying data files are deleted from the file system.
External Tables: These tables have their data stored in an external location (like ADLS, S3, etc.) that is not managed by the metastore. When you drop an external table, only the metadata is removed from the metastore, but the actual data files remain intact in the external location.
In this scenario, since both data files and metadata files were deleted, it indicates that the table was a managed table. The size of the data (options B and C) is irrelevant to whether files are deleted when dropping a table. Option D (external table) is incorrect because external tables don't delete data files. Option E is also incorrect because all tables have a location, whether managed or external.
Key takeaway: Always be aware of whether you're working with managed or external tables, as dropping a managed table permanently deletes the data, while dropping an external table only removes the metadata reference.