
Ultimate access to all questions.
Deep dive into the quiz with AI chat providers.
We prepare a focused prompt with your quiz and certificate details so each AI can offer a more tailored, in-depth explanation.
Question 15 A data engineering team has created a series of tables using Parquet data stored in an external system. The team is noticing that after appending new rows to the data in the external system, their queries within Databricks are not returning the new rows. They identify the caching of the previous data as the cause of this issue.
Which of the following approaches will ensure that the data returned by queries is always up-to-date?
A
Use the REFRESH TABLE command before querying the table
B
Use the CACHE TABLE command to cache the table
C
Use the UNCACHE TABLE command to remove the table from cache
D
Use the DESCRIBE TABLE command to view the table schema
E
Use the ANALYZE TABLE command to update table statistics
F
Use the MSCK REPAIR TABLE command to repair the table
Explanation:
When working with external tables in Databricks that reference Parquet data stored in external systems, caching can cause stale data to be returned even after the underlying data has been updated.
Correct Approach: REFRESH TABLE (Option A)
REFRESH TABLE command explicitly invalidates the cached metadata and forces Databricks to reload the table metadata from the external storageAlternative Approach: UNCACHE TABLE (Option C)
UNCACHE TABLE command removes the table from cache entirelyREFRESH TABLE is more appropriate for this scenarioWhy other options don't work:
Best Practice: For external tables that are frequently updated, consider using REFRESH TABLE before queries or configuring appropriate caching strategies to balance performance with data freshness requirements.