
Answer-first summary for fast verification
Answer: Store data in its native nested format, leveraging Spark‘s capability to query nested structures directly.
Storing data in its native nested format is the most efficient approach for querying and analyzing complex JSON data within a Spark-based lakehouse. Spark's built-in functions are designed to handle nested structures directly, eliminating the need for flattening or normalizing the data. This method preserves the original data's integrity, minimizes redundancy, and avoids the complexities associated with creating multiple relational tables or managing schema evolution dynamically. While flattening the JSON structure (option D) or normalizing it into relational tables (option A) are possible, they introduce inefficiencies and potential data integrity issues. Similarly, using Delta Lake's schema evolution (option B) is beneficial for managing schema changes but does not directly address the efficient querying of nested data structures.
Author: LeetQuiz Editorial Team
Ultimate access to all questions.
No comments yet.
When dealing with complex nested JSON data representing e-commerce transactions in a Spark-based lakehouse, which method is most efficient for querying and analysis?
A
Normalize the JSON structure into multiple relational tables, creating foreign keys for nested relationships.
B
Use Delta Lake‘s schema evolution to dynamically adjust to changes in the JSON structure over time.
C
Store data in its native nested format, leveraging Spark‘s capability to query nested structures directly.
D
Flatten the JSON structure into a wide table format, duplicating parent data for each nested element.