You are working on analyzing customer purchases within a Fabric notebook using PySpark. The analysis involves two primary DataFrames described as follows: 1. `transactions`: This DataFrame contains transaction data with 10 million rows and five columns: `transaction_id`, `customer_id`, `product_id`, `amount`, and `date`. Each row corresponds to a single transaction. 2. `customers`: This DataFrame holds customer details with 1,000 rows and three columns: `customer_id`, `name`, and `country`. Your task is to join these DataFrames on the `customer_id` column. It is crucial to minimize data shuffling during this process. You start by writing the following code: ```python from pyspark.sql import functions as F results = ``` What code should you complete to populate the `results` DataFrame and achieve the goal of minimal data shuffling? | Microsoft Fabric Analytics Engineer Associate DP-600 Quiz

You are working on analyzing customer purchases within a Fabric notebook using PySpark. The analysis involves two primary DataFrames described as follows:

transactions: This DataFrame contains transaction data with 10 million rows and five columns: transaction_id, customer_id, product_id, amount, and date. Each row corresponds to a single transaction.
customers: This DataFrame holds customer details with 1,000 rows and three columns: customer_id, name, and country.

Your task is to join these DataFrames on the customer_id column. It is crucial to minimize data shuffling during this process. You start by writing the following code:

from pyspark.sql import functions as F
results =

What code should you complete to populate the results DataFrame and achieve the goal of minimal data shuffling?

Exam-Like

Microsoft Fabric Analytics Engineer Associate DP-600

Get started today

Comments