Join the ultimate data engineering academy to take your career to the next level!

In this lecture, Zach discusses the importance of learning Spark and its role as a resilient framework for distributed compute. He explains how Spark can process large amounts of data and its advantages over other technologies like Java, MapReduce, and Hive. He also touches on the considerations for choosing Spark in complex pipelines and provide tips for optimizing Spark jobs. [Recorded on May28th, 2024].
Note - This is not the edited version. we will be replacing it with the edited and refined version soon. 

Spark Fundamentals (Day 1 Lecture)

Purchase Required

Spark Fundamentals Spark Lab on REST API consumption (Day 1 Lab)

Week 4: Batch Pipelines with Apache Spark

Advanced Spark (Day 2 Lecture)