
Explanation:
Invoking the Glue create_partition API call from the code that writes data to S3 updates the Data Catalog immediately and synchronously as each new partition is created. This real-time approach has the absolute lowest latency — the catalog reflects the new partition the instant data lands in S3, unlike scheduled crawlers (daily lag), manual API calls (manual effort), or MSCK REPAIR TABLE (batch scan).
Ultimate access to all questions.
A company needs to partition the Amazon S3 storage that the company uses for a data lake. The partitioning will use a path of the S3 object keys in the following format: s3://bucket/prefix/year=2023/month=01/day=01. A data engineer must ensure that the AWS Glue Data Catalog synchronizes with the S3 storage when the company adds new partitions to the bucket. Which solution will meet these requirements with the LEAST latency?
A
Schedule an AWS Glue crawler to run every morning.
B
Manually run the AWS Glue CreatePartition API twice each day.
C
Use code that writes data to Amazon S3 to invoke the Boto3 AWS Glue create_partition API call.
D
Run the MSCK REPAIR TABLE command from the AWS Glue console.
No comments yet.