
A data engineer runs a statement every day to copy the previous day's sales into the table transactions. Each day's sales are in their own file in the location "/transactions/raw". Today, the data engineer runs the following command to complete this task:
COPY INTO transactions FROM "/transactions/raw" FILEFORMAT = PARQUET;
After running the command today, the data engineer notices that the number of records in table transactions has not changed. Which of the following describes why the statement might not have copied any new records into the table?
A
The format of the files to be copied were not included with the FORMAT_OPTIONS keyword.
B
The names of the files to be copied were not included with the FILES keyword.
Explanation:
The COPY INTO command in Databricks is designed to be idempotent - it tracks which files have already been copied and will not re-copy them. When the data engineer runs the same COPY INTO statement daily, it will only copy new files that haven't been processed before. If the previous day's file was already copied in a previous run, the command will not copy it again, resulting in no change to the record count.
Key points:
Ultimate access to all questions.