
Ultimate access to all questions.
Question 15 A data engineering team has created a series of tables using Parquet data stored in an external system. The team is noticing that after appending new rows to the data in the external system, their queries within Databricks are not returning the new rows. They identify the caching of the previous data as the cause of this issue.
Which of the following approaches will ensure that the data returned by queries is always up-to-date?
Explanation:
When working with external tables in Databricks that reference Parquet data stored in external systems, caching can cause stale data to be returned even after the underlying data has been updated.
Correct Approach: REFRESH TABLE (Option A)
REFRESH TABLE command explicitly invalidates the cached metadata and forces Databricks to reload the table metadata from the external storageAlternative Approach: UNCACHE TABLE (Option C)
UNCACHE TABLE command removes the table from cache entirelyREFRESH TABLE is more appropriate for this scenarioWhy other options don't work:
Best Practice: For external tables that are frequently updated, consider using REFRESH TABLE before queries or configuring appropriate caching strategies to balance performance with data freshness requirements.