Explanation:
The correct answer is Temporary view (D) because:
- Temporary views are session-scoped - They exist only for the duration of the current Spark session and are automatically dropped when the session ends.
- No physical data storage - Temporary views don't create physical copies of data; they are logical views that reference the underlying tables.
- Cost-effective - Since no data is duplicated or stored separately, storage costs are minimized.
- Session isolation - Other data engineers in other sessions cannot access this temporary view, which aligns with the requirement that it doesn't need to be used by others.
Why other options are incorrect:
- A. Spark SQL Table: Creates a physical table that stores data, increasing storage costs.
- B. View: While views don't store physical data, they are persistent and can be accessed by other sessions/users.
- C. Database: A container for tables/views, not a relational object that pulls data from tables.
- E. Delta Table: Creates a physical table with Delta Lake format, storing data and increasing storage costs.
Key points about temporary views:
- Created using
CREATE TEMP VIEW view_name AS query or CREATE TEMPORARY VIEW view_name AS query
- Session ends when: opening a new notebook, detaching/reattaching a cluster, installing a Python package, or restarting a cluster
- Perfect for intermediate results that don't need persistence or sharing across sessions