
Explanation:
There are three modes available when reading data from CSV files: 1. PERMISSIVE – Replaces unparsable data with nulls (default mode). 2. DROPMALFORMED – Drops rows with improper data. 3. FAILFAST – Fails the command if data cannot be parsed properly. In this scenario, the 'age' column is of IntegerType, but contains 'NA' (a string), making those records malformed. Using DROPMALFORMED ensures these records are dropped.
Ultimate access to all questions.
No comments yet.
When manually recording the name and age of each person entering a museum in a CSV file, the following code block is intended to read that CSV file and convert it into a DataFrame.
StructType([StructField('name', StringType()), StructField('age', IntegerType())])
df = spark.read.format(csv) .schema(schema) ___________________ .load(/tmp/logs.csv)
The code reads the CSV file with the schema and loads it into a DataFrame. What should fill the blank to ensure records with 'NA' in the 'age' column are excluded from the DataFrame?
A
.option('mode', 'DROPMALFORMED')
B
.option('mode', 'DROPNA')
C
.option('drop', 'NA')
D
.option('mode', 'PERMISSIVE')
E
.option('mode', 'FAILFAST')