Spark Batch Processing - Comparing with Hive and MapReduce, Key Components, and Performance Optimization (Day 1 Lecture)

In this lecture, the instructor explores Apache Spark's advantages for data processing and analysis, comparing it with technologies like Hive and MapReduce. The lecture covers Spark's handling of various data sources, its key components (driver and executor), memory management, and performance optimization techniques, such as minimizing shuffle and skew.

73 mins

Purchase Required

You need to purchase this content in order to view it

Spark Batch Processing - Data Partitioning, Performance Optimization, and Iceberg Tables (Day 1 Lab)

Week 4: Batch Pipelines with Apache Spark V2

Spark Batch Processing - Caching, DataFrame, Dataset, SparkSQL, and Bucketing in Iceberg (Day 2 Lab)