
Ultimate access to all questions.
A data engineer is attempting to drop a Spark SQL table my_table and runs the following command:
DROP TABLE IF EXISTS my_table;
After running this command, the engineer notices that the data files and metadata files have been deleted from the file system.
What is the reason behind the deletion of all these files?
A
The table was managed
B
The table's data was smaller than 10 GB
C
The table did not have a location
D
The table was external
Explanation:
In Databricks/Spark SQL, there are two types of tables:
Managed Tables: These are tables where Spark manages both the metadata (in the metastore) and the data files. When you drop a managed table using DROP TABLE, Spark deletes both the metadata from the metastore and the underlying data files from the storage.
External Tables: These are tables where Spark only manages the metadata in the metastore, while the data files are stored in an external location (like ADLS, S3, etc.) that you specify. When you drop an external table, Spark only removes the metadata from the metastore, but does not delete the underlying data files.
In this scenario:
DROP TABLE IF EXISTS my_table;my_table was a managed tableKey points:
IF EXISTS clause just prevents errors if the table doesn't exist, but doesn't affect the deletion behaviorBest Practice: When working with important data that should persist beyond table operations, use external tables with explicit location specifications.