
Answer-first summary for fast verification
Answer: The previous day's file has already been copied into the table.
The COPY INTO statement in Databricks has built-in idempotency - it tracks which files have already been copied and will not re-copy them. When the data engineer runs the same COPY INTO command daily, if the previous day's file has already been processed and copied into the table, the command will recognize this and skip copying the same file again. This is the intended behavior to prevent duplicate data ingestion. **Key Points:** - COPY INTO automatically tracks which files have been successfully loaded - It uses file metadata (path, size, modification time) to determine if a file has already been processed - This prevents duplicate data loading when running the same COPY INTO command multiple times - The other options are incorrect: - **A**: FILEFORMAT = PARQUET is sufficient; FORMAT_OPTIONS is optional - **B**: FILES keyword is not required when using a directory path - **D**: PARQUET is fully supported by COPY INTO - **E**: No refresh is needed; changes are immediately visible
Author: Keng Suppaseth
Ultimate access to all questions.
No comments yet.
A data engineer runs a statement every day to copy the previous day's sales into the table transactions. Each day's sales are in their own file in the location "/transactions/raw". Today, the data engineer runs the following command to complete this task:
COPY INTO transactions
FROM "/transactions/raw"
FILEFORMAT = PARQUET;
COPY INTO transactions
FROM "/transactions/raw"
FILEFORMAT = PARQUET;
After running the command today, the data engineer notices that the number of records in table transactions has not changed. Which of the following describes why the statement might not have copied any new records into the table?
A
The format of the files to be copied were not included with the FORMAT_OPTIONS keyword.
B
The names of the files to be copied were not included with the FILES keyword.
C
The previous day's file has already been copied into the table.
D
The PARQUET file format does not support COPY INTO.
E
The COPY INTO statement requires the table to be refreshed to view the copied rows.