
Databricks Certified Data Engineer - Professional
Get started today
Ultimate access to all questions.
When manually recording the name and age of each person entering a museum in a CSV file, the following code block is intended to read that CSV file and convert it into a DataFrame.
The schema is defined as
StructType([StructField('name', StringType()), StructField('age', IntegerType())])
df = spark.read.format(csv) .schema(schema) ___________________ .load(/tmp/logs.csv)
The code reads the CSV file with the schema and loads it into a DataFrame. What should fill the blank to ensure records with 'NA' in the 'age' column are excluded from the DataFrame?
When manually recording the name and age of each person entering a museum in a CSV file, the following code block is intended to read that CSV file and convert it into a DataFrame.
The schema is defined as
StructType([StructField('name', StringType()), StructField('age', IntegerType())])
df = spark.read.format(csv) .schema(schema) ___________________ .load(/tmp/logs.csv)
The code reads the CSV file with the schema and loads it into a DataFrame. What should fill the blank to ensure records with 'NA' in the 'age' column are excluded from the DataFrame?
Explanation:
There are three modes available when reading data from CSV files: 1. PERMISSIVE – Replaces unparsable data with nulls (default mode). 2. DROPMALFORMED – Drops rows with improper data. 3. FAILFAST – Fails the command if data cannot be parsed properly. In this scenario, the 'age' column is of IntegerType
, but contains 'NA' (a string), making those records malformed. Using DROPMALFORMED
ensures these records are dropped.