Explanation:
A temporary view is the correct choice because:
- Session-based scope: Temporary views exist only for the duration of the current Spark session and are automatically dropped when the session ends.
- No physical storage: Temporary views don't create physical copies of data on storage - they are logical views defined by a query.
- Cost-effective: Since no data is duplicated or stored separately, storage costs are minimized.
- Private to session: Other data engineers in other sessions cannot access the temporary view, which matches the requirement that it doesn't need to be used by others.
Why other options are incorrect:
- Spark SQL Table (A): Creates a physical table that stores data persistently, increasing storage costs.
- View (B): While views don't store physical data, they are persistent and can be accessed by other users/sessions.
- Database (C): A database is a container for tables/views, not a relational object created by pulling data from tables.
- Delta Table (E): Creates a physical Delta table that stores data persistently, increasing storage costs.
Key characteristics of temporary views:
- Created with
CREATE TEMP VIEW view_name AS query
- Session ends when: opening a new notebook, detaching/reattaching a cluster, installing a Python package, or restarting a cluster
- Ideal for intermediate transformations that don't need persistence or sharing across sessions.