Join the ultimate data engineering academy to take your career to the next level!

In this lecture, the instructor explores Apache Spark's advantages for data processing and analysis, comparing it with technologies like Hive and MapReduce. The lecture covers Spark's handling of various data sources, its key components (driver and executor), memory management, and performance optimization techniques, such as minimizing shuffle and skew.

Spark Batch Processing - Comparing with Hive and MapReduce, Key Components, and Performance Optimization (Day 1 Lecture)

Purchase Required

Spark Batch Processing - Data Partitioning, Performance Optimization, and Iceberg Tables (Day 1 Lab)

Week 4: Batch Pipelines with Apache Spark V2

Spark Batch Processing - Caching, DataFrame, Dataset, SparkSQL, and Bucketing in Iceberg (Day 2 Lab)