Databricks Certified Data Engineer - Professional

Databricks Certified Data Engineer - Professional

Get started today

Ultimate access to all questions.


When manually recording the name and age of each person entering a museum in a CSV file, the following code block is intended to read that CSV file and convert it into a DataFrame.

The schema is defined as

StructType([StructField('name', StringType()), StructField('age', IntegerType())])

df = spark.read.format(csv) .schema(schema) ___________________ .load(/tmp/logs.csv)

The code reads the CSV file with the schema and loads it into a DataFrame. What should fill the blank to ensure records with 'NA' in the 'age' column are excluded from the DataFrame?





Explanation:

There are three modes available when reading data from CSV files: 1. PERMISSIVE – Replaces unparsable data with nulls (default mode). 2. DROPMALFORMED – Drops rows with improper data. 3. FAILFAST – Fails the command if data cannot be parsed properly. In this scenario, the 'age' column is of IntegerType, but contains 'NA' (a string), making those records malformed. Using DROPMALFORMED ensures these records are dropped.