
Answer-first summary for fast verification
Answer: Use the REFRESH TABLE command before querying the table
## Explanation When working with external tables in Databricks that reference Parquet data stored in external systems, caching can cause stale data to be returned even after the underlying data has been updated. **Correct Approach: REFRESH TABLE (Option A)** - The `REFRESH TABLE` command explicitly invalidates the cached metadata and forces Databricks to reload the table metadata from the external storage - This ensures that subsequent queries will see the latest data including any newly appended rows - This is the most direct solution for this specific problem **Alternative Approach: UNCACHE TABLE (Option C)** - The `UNCACHE TABLE` command removes the table from cache entirely - This would also solve the problem but is less efficient as it requires the table to be re-cached on subsequent queries - While it works, `REFRESH TABLE` is more appropriate for this scenario **Why other options don't work:** - **Option B (CACHE TABLE)**: This would make the problem worse by caching the stale data - **Option D (DESCRIBE TABLE)**: Only shows schema information, doesn't refresh cached data - **Option E (ANALYZE TABLE)**: Updates statistics for query optimization but doesn't refresh cached data - **Option F (MSCK REPAIR TABLE)**: Used for Hive metastore partition discovery, not for refreshing cached data **Best Practice**: For external tables that are frequently updated, consider using `REFRESH TABLE` before queries or configuring appropriate caching strategies to balance performance with data freshness requirements.
Author: LeetQuiz .
Ultimate access to all questions.
No comments yet.
Question 15 A data engineering team has created a series of tables using Parquet data stored in an external system. The team is noticing that after appending new rows to the data in the external system, their queries within Databricks are not returning the new rows. They identify the caching of the previous data as the cause of this issue.
Which of the following approaches will ensure that the data returned by queries is always up-to-date?
A
Use the REFRESH TABLE command before querying the table
B
Use the CACHE TABLE command to cache the table
C
Use the UNCACHE TABLE command to remove the table from cache
D
Use the DESCRIBE TABLE command to view the table schema
E
Use the ANALYZE TABLE command to update table statistics
F
Use the MSCK REPAIR TABLE command to repair the table