
Ultimate access to all questions.
Your company has a large dataset of customer transaction records stored in a relational database. You need to transform this data to analyze customer behavior patterns. Describe how you would use Apache Spark to process the data, including the steps involved in creating a resilient and scalable ETL pipeline.
A
Load the data directly into a data warehouse and perform SQL queries to analyze customer behavior patterns.
B
Use Apache Spark to read the data from the relational database, perform necessary transformations, and store the transformed data in a distributed file system for further analysis.
C
Use a MapReduce approach to process the data in Apache Spark, as it is more suitable for batch processing.
D
Use Apache Spark's machine learning libraries to directly predict customer behavior patterns without any data transformation.