Databricks Certified Data Engineer - Professional

Get started today

Ultimate access to all questions.

Explanation:

Simply converting a Parquet table to Delta Lake will not significantly improve keyword search performance for free-form text. Delta Lake's primary mechanism for performance optimization is data skipping, which relies on file-level statistics like minimum and maximum values.

For high-cardinality string columns where users perform substring searches (e.g., LIKE '%keyword%'), these statistics are largely useless because the filter does not compare a value against a range. Furthermore, Delta Lake often truncates long strings when collecting statistics, making the min/max values even less meaningful for searching content within the text.

Why the other options are incorrect:

B: Delta Lake fully supports the STRING data type.
C: Z-Ordering is designed to improve performance for equality or range predicates by clustering similar data. It still relies on min/max statistics, which do not benefit substring searches in varied text fields.
D: The Delta transaction log contains metadata and file paths, not a full-text search index.
E: By default, Delta Lake collects statistics for the first 32 columns of a table, not four, and this limit is configurable via table properties.

Explanation:

Why the other options are incorrect:

B: Delta Lake fully supports the STRING data type.
C: Z-Ordering is designed to improve performance for equality or range predicates by clustering similar data. It still relies on min/max statistics, which do not benefit substring searches in varied text fields.
D: The Delta transaction log contains metadata and file paths, not a full-text search index.
E: By default, Delta Lake collects statistics for the first 32 columns of a table, not four, and this limit is configurable via table properties.

Comments (0)

No comments yet.

A data science team is experiencing slow query performance when searching for specific keywords within a high-cardinality `review` column (STRING) stored in a Parquet table. A junior data engineer suggests migrating the table to Delta Lake to resolve the performance bottleneck.

How should you respond to this suggestion?

Real Exam

Last updated: January 6, 2026 at 15:42

Delta Lake statistics are ineffective for high-cardinality free-text fields like reviews, providing limited performance benefits for keyword-based substring searches.

55.3%

Delta Lake tables do not support the STRING data type, which would require the team to restructure the review data into a different format.

5.3%

While Delta Lake offers performance benefits, significant gains for keyword searches would only be achieved by applying Z-ORDER optimization to the review column.

18.4%

The Delta Lake transaction log automatically generates a structured index for text fields, enabling efficient keyword filtering without full table scans.

13.2%

Powered ByGPT 5.4 powered

Delta Lake only collects metadata statistics for the first four columns in a table schema, making it ineffective for the review column in this specific table.

7.9%