Databricks Certified Data Engineer - Associate

Get started today

Ultimate access to all questions.

Deep dive into the quiz with AI chat providers.

We prepare a focused prompt with your quiz and certificate details so each AI can offer a more tailored, in-depth explanation.

A data engineer runs a statement every day to copy the previous day's sales into the table transactions. Each day's sales are in their own file in the location "/transactions/raw".

Today, the data engineer runs the following command to complete this task:

COPY INTO transactions
FROM "/transactions/raw"
FILEFORMAT = PARQUET;

COPY INTO transactions
FROM "/transactions/raw"
FILEFORMAT = PARQUET;

After running the command today, the data engineer notices that the number of records in table transactions has not changed.

What explains why the statement might not have copied any new records into the table?

Real Exam

Community

KKeng

Last updated: January 13, 2026 at 09:15

The format of the files to be copied were not included with the FORMAT_OPTIONS keyword.

The COPY INTO statement requires the table to be refreshed to view the copied rows.

The previous day's file has already been copied into the table.

The PARQUET file format does not support COPY INTO.

Explanation:

Explanation

The correct answer is C because the COPY INTO statement in Databricks has built-in idempotency and deduplication capabilities. When COPY INTO is executed:

Automatic file tracking: COPY INTO keeps track of files that have already been processed
Idempotent behavior: If the same file is processed again, it won't be reprocessed
No duplicate data: This prevents duplicate records from being inserted into the table

In this scenario, the previous day's file has likely already been processed by a previous COPY INTO execution, so when the command runs again, it recognizes that the file has already been loaded and skips it.

Why the other options are incorrect:

A: FORMAT_OPTIONS is not required for basic Parquet file loading; FILEFORMAT = PARQUET is sufficient.
B: COPY INTO does not require table refresh; data is immediately available after successful execution.
D: PARQUET file format is fully supported by COPY INTO in Databricks.

This behavior is actually a feature of COPY INTO that ensures data consistency and prevents duplicate loading, which is crucial for incremental data processing workflows.

Powered ByGPT-5.2

Comments

Loading comments...