
Ultimate access to all questions.
What is the most performant way to store a DataFrame in a Vector Search index when the DataFrame has two columns: (1) the original document file name and (2) an array of text chunks for each document?
A
Split the data into train and test set, create a unique identifier for each document, then save to a Delta table
B
Flatten the dataframe to one chunk per row, create a unique identifier for each row, and save to a Delta table
C
First create a unique identifier for each document, then save to a Delta table
D
Store each chunk as an independent JSON file in Unity Catalog Volume. For each JSON file, the key is the document section name and the value is the array of text chunks for that section