
Ultimate access to all questions.
A Generative AI Engineer has developed scalable PySpark code to process unstructured PDF documents and split them into chunks for storage in a Databricks Vector Search index. The resulting DataFrame contains two columns: the original filename as a string and an array of text chunks from that document.
What steps must the Generative AI Engineer take to prepare and store these chunks for ingestion into Databricks Vector Search?
A
Use PySpark’s autoloader to apply a UDF across all chunks, formatting them in a JSON structure for Vector Search ingestion.
B
Flatten the dataframe to one chunk per row, create a unique identifier for each row, and enable change feed on the output Delta table.
C
Utilize the original filename as the unique identifier and save the dataframe as is.
D
Create a unique identifier for each document, flatten the dataframe to one chunk per row and save to an output Delta table.