Microsoft Fabric Analytics Engineer Associate DP-600

Microsoft Fabric Analytics Engineer Associate DP-600

Get started today

Ultimate access to all questions.


In the context of a Fabric workspace using the default Spark starter pool and runtime version 1.2, you aim to read a CSV file named Sales_raw.csv located in a lakehouse. You intend to select specific columns and save the filtered data as a Delta table in the managed area of the lakehouse. The CSV file Sales_raw.csv comprises 12 columns. You have the following code snippet:

from pyspark.sql import SparkSession

spark = SparkSession.builder.appName('example').getOrCreate()
df = spark.read.csv('/path/to/Sales_raw.csv', header=True)
selected_df = df.select('SalesOrderNumber', 'OrderDate', 'CustomerName', 'UnitPrice')
selected_df.write.format('delta').mode('overwrite').partitionBy('OrderDate').save('/path/to/output')

For each of the following statements, select Yes if the statement is true. Otherwise, select No.

  1. Does the code select only specific columns from the DataFrame?
  2. Will removing the PartitionBy line result in no performance changes?
  3. Will adding inferSchema=True result in extra time in execution?