
Explanation:
The question asks for a 15% sample without replacement. In PySpark, DataFrame.sample() defaults to withReplacement=False if not specified. Option B correctly sets fraction=0.15 and omits withReplacement, thus using the default. Option A uses True (with replacement), which is incorrect. Option C uses sampleBy, which requires a col and a fractions dict. Option D's fraction is 0.10 (10%). Option E is missing the fraction parameter, causing an error. Therefore, only B is correct.
Ultimate access to all questions.
Which of the following code blocks returns a 15% sample of rows from the DataFrame storesDF without replacement?
A
storesDF.sample(True, fraction = 0.15)
B
storesDF.sample(fraction = 0.15)
C
storesDF.sampleBy(fraction = 0.15)
D
storesDF.sample(fraction = 0.10)
E
storesDF.sample()
No comments yet.