AWS Certified Solutions Architect - Associate

Ultimate access to all questions.

Explanation:

Explanation

Correct Answer: A - Edit the job to use job bookmarks.

Why this is correct:

AWS Glue Job Bookmarks are specifically designed to track data that has already been processed in previous job runs.
When job bookmarks are enabled, AWS Glue maintains state information about what data has been processed, preventing reprocessing of the same data in subsequent runs.
This is the most efficient and recommended AWS solution for incremental data processing scenarios where new data is added daily to an S3 bucket.
Job bookmarks work by tracking the files that have been processed and only processing new or modified files in subsequent runs.

Why other options are incorrect:

B - Edit the job to delete data after the data is processed:

This is not a good practice as it would permanently delete source data after processing.
The requirement is to prevent reprocessing, not to delete source data.
Deleting source data would prevent any re-processing or auditing needs.

C - Edit the job by setting the NumberOfWorkers field to 1:

This controls the number of DPUs (Data Processing Units) allocated to the job, not data processing logic.
Setting workers to 1 would not prevent reprocessing of old data; it would only affect processing speed and parallelism.
This might actually slow down processing but wouldn't solve the incremental processing problem.

D - Use a FindMatches machine learning (ML) transform:

FindMatches is an AWS Glue ML transform used for deduplication and finding matching records.
This is not relevant to preventing reprocessing of entire datasets; it's for data quality and matching purposes.
It doesn't track what data has been processed in previous runs.

Key AWS Glue Concepts:

Job Bookmarks: Track processed data to enable incremental processing
Incremental Processing: Only process new or modified data since the last job run
State Management: AWS Glue maintains state information in its data catalog

Best Practice: Always enable job bookmarks for recurring ETL jobs that process new data incrementally to optimize costs and processing time.

Explanation:

Correct Answer: A - Edit the job to use job bookmarks.

Why this is correct:

AWS Glue Job Bookmarks are specifically designed to track data that has already been processed in previous job runs.
When job bookmarks are enabled, AWS Glue maintains state information about what data has been processed, preventing reprocessing of the same data in subsequent runs.
This is the most efficient and recommended AWS solution for incremental data processing scenarios where new data is added daily to an S3 bucket.
Job bookmarks work by tracking the files that have been processed and only processing new or modified files in subsequent runs.

Why other options are incorrect:

B - Edit the job to delete data after the data is processed:

This is not a good practice as it would permanently delete source data after processing.
The requirement is to prevent reprocessing, not to delete source data.
Deleting source data would prevent any re-processing or auditing needs.

C - Edit the job by setting the NumberOfWorkers field to 1:

This controls the number of DPUs (Data Processing Units) allocated to the job, not data processing logic.
Setting workers to 1 would not prevent reprocessing of old data; it would only affect processing speed and parallelism.
This might actually slow down processing but wouldn't solve the incremental processing problem.

D - Use a FindMatches machine learning (ML) transform:

FindMatches is an AWS Glue ML transform used for deduplication and finding matching records.
This is not relevant to preventing reprocessing of entire datasets; it's for data quality and matching purposes.
It doesn't track what data has been processed in previous runs.

Key AWS Glue Concepts:

Job Bookmarks: Track processed data to enable incremental processing
Incremental Processing: Only process new or modified data since the last job run
State Management: AWS Glue maintains state information in its data catalog

Best Practice: Always enable job bookmarks for recurring ETL jobs that process new data incrementally to optimize costs and processing time.

No comments yet.

Other

Community

UAnonymous

Last updated: February 23, 2026 at 11:39

Edit the job to use job bookmarks.

Edit the job to delete data after the data is processed.