
Answer-first summary for fast verification
Answer: Temporary view
**Explanation:** The correct answer is **Temporary view (D)** because: 1. **Temporary views are session-scoped** - They exist only for the duration of the current Spark session and are automatically dropped when the session ends. 2. **No physical data storage** - Temporary views don't create physical copies of data; they are logical views that reference the underlying tables. 3. **Cost-effective** - Since no data is duplicated or stored separately, storage costs are minimized. 4. **Session isolation** - Other data engineers in other sessions cannot access this temporary view, which aligns with the requirement that it doesn't need to be used by others. **Why other options are incorrect:** - **A. Spark SQL Table**: Creates a physical table that stores data, increasing storage costs. - **B. View**: While views don't store physical data, they are persistent and can be accessed by other sessions/users. - **C. Database**: A container for tables/views, not a relational object that pulls data from tables. - **E. Delta Table**: Creates a physical table with Delta Lake format, storing data and increasing storage costs. **Key points about temporary views:** - Created using `CREATE TEMP VIEW view_name AS query` or `CREATE TEMPORARY VIEW view_name AS query` - Session ends when: opening a new notebook, detaching/reattaching a cluster, installing a Python package, or restarting a cluster - Perfect for intermediate results that don't need persistence or sharing across sessions
Author: Keng Suppaseth
Ultimate access to all questions.
A data engineer wants to create a relational object by pulling data from two tables. The relational object does not need to be used by other data engineers in other sessions. In order to save on storage costs, the data engineer wants to avoid copying and storing physical data. Which of the following relational objects should the data engineer create?
A
Spark SQL Table
B
View
C
Database
D
Temporary view
E
Delta Table
No comments yet.