Databricks Certified Generative AI Engineer - Associate

Get started today

Ultimate access to all questions.

Deep dive into the quiz with AI chat providers.

We prepare a focused prompt with your quiz and certificate details so each AI can offer a more tailored, in-depth explanation.

Question: 5

You are tasked with writing a large, chunked text dataset into Delta Lake tables within Unity Catalog. The data needs to be prepared efficiently for querying and analysis. Which of the following is the correct sequence of operations to write the chunked text data into a Delta Lake table?

Real Exam

Community

LLeetQuiz

Combine chunks → Convert to DataFrame → Define Delta Table schema → Write to Delta Lake in Merge mode

Combine chunks → Convert to DataFrame → Write to Delta Lake in Overwrite mode

Convert to DataFrame → Combine chunks → Write to Delta Lake in Append mode

Combine chunks → Create Delta Table schema → Write to Delta Lake in Append mode

Explanation:

The correct sequence is B because:

Combine chunks - First, you need to combine the chunked text data into a unified dataset
Convert to DataFrame - Then convert the combined data into a DataFrame format that can be processed by Spark
Write to Delta Lake in Overwrite mode - Finally, write the DataFrame to Delta Lake using Overwrite mode, which is efficient for initial data loading and replaces any existing data

This approach is efficient because:

Combining chunks first avoids creating multiple small files
Converting to DataFrame enables Spark's distributed processing
Overwrite mode is optimal for initial data loading scenarios where you want to replace existing data

Other options are incorrect:

A includes unnecessary schema definition and uses Merge mode which is for upsert operations
C converts to DataFrame before combining, which is inefficient for chunked data
D creates schema separately and uses Append mode which is for adding to existing data

Powered ByGPT-5.2

Comments

Loading comments...