Spark Data Quality (Day 3 Lab)

This is a pre-recorded class. In this lab, we embark on an exploration of Spark, offering a comprehensive overview of its main components and the intricacies of data transformation within a Spark job. The focus then shifts to the testing paradigm in PySpark code, where I guide you through the creation of input and output data frames and the utilization of named tuples for comparison. Delving into the significance of testing, we unravel its role in enhancing development velocity. The discussion is enriched with insights into common challenges and practical tips for conducting effective tests. Join us in this video to deepen your comprehension of Spark and acquire essential skills in testing PySpark code.[Recorded on Dec 12, 2023]

35 mins

Purchase Required

You need to purchase this content in order to view it

Advanced Spark (Day 2 Lecture)

Week 4: Batch Pipelines with Apache Spark

Kafka Fundamentals, consumer APIs with Confluent (Day 1 Lab)