Databricks Certified Data Engineer - Associate

Ultimate access to all questions.

Consider a scenario where you are tasked with extracting data from a directory of JSON files located at /data/logs. Each file contains log entries with fields timestamp, user_id, and action. Your goal is to create a DataFrame that includes only the entries where the action field is 'login'. Which of the following code snippets is NOT correct for this task?

Simulated

df = spark.read.csv('/data/logs').filter(col('action') == 'login')

20.6%

df = spark.read.format('json').load('/data/logs').where(col('action') == 'login')

Loading comments...

df = spark.read.json('/data/logs').select('timestamp', 'user_id', 'action').filter(col('action') == 'login')

19.7%

df = spark.read.format('json').load('/data/logs').select('timestamp', 'user_id', 'action').where(col('action') == 'login')

20.4%