
Explanation:
The query:
SELECT COUNT(DISTINCT *) FROM user
SELECT COUNT(DISTINCT *) FROM user
is asking how many distinct rows exist in the table.
DISTINCT * means Spark compares entire rows, not individual columns.COUNT(DISTINCT ...) counts how many unique rows exist.COUNT(column), this does not exclude rows just because they contain NULL.| Row | userId | username | Distinct? | |
|---|---|---|---|---|
| 1 | 1 | john.smith | john.smith@com | ✅ |
| 2 | 2 | NULL | david.clear@com | ✅ |
| 3 | 3 | kevin.smith | kevin.smith@com | ✅ |
All three rows are different, even though one row contains NULL.
✅ Correct Answer: A (3)
COUNT never returns NULL.Ultimate access to all questions.
What would be the expected result of executing the query SELECT COUNT(DISTINCT *) FROM user on a table below:
| userId | username | |
|---|---|---|
| 1 | john.smith | john.smith@com |
| 2 | NULL | david.clear@com |
| 3 | kevin.smith | kevin.smith@com |
A
3
B
2
C
1
D
NULL