Apache Spark 3 Spark Programming in Python for Beginners

Data Engineering using Spark Structured API
Apache Spark 3 Spark Programming in Python for Beginners
File Size :
2.85 GB
Total length :
6h 36m

Category

Instructor

Prashant Kumar Pandey

Language

Last update

Last updated 7/2021

Ratings

4.5/5

Apache Spark 3 Spark Programming in Python for Beginners

What you’ll learn

Apache Spark Foundation and Spark Architecture
Data Engineering and Data Processing in Spark
Working with Data Sources and Sinks
Working with Data Frames and Spark SQL
Using PyCharm IDE for Spark Development and Debugging
Unit Testing, Managing Application Logs and Cluster Deployment

Apache Spark 3 Spark Programming in Python for Beginners

Requirements

Programming Knowledge Using Python Programming Language
A Recent 64-bit Windows/Mac/Linux Machine with 8 GB RAM

Description

This course does not require any prior knowledge of Apache Spark or Hadoop. We have taken enough care to explain Spark Architecture and fundamental concepts to help you come up to speed and grasp the content of this course.About the CourseI am creating Apache Spark 3 – Spark Programming in Python for Beginners course to help you understand the Spark programming and apply that knowledge to build data engineering solutions. This course is example-driven and follows a working session like approach. We will be taking a live coding approach and explain all the needed concepts along the way.Who should take this Course?I designed this course for software engineers willing to develop a Data Engineering pipeline and application using the Apache Spark. I am also creating this course for data architects and data engineers who are responsible for designing and building the organization’s data-centric infrastructure. Another group of people is the managers and architects who do not directly work with Spark implementation. Still, they work with the people who implement Apache Spark at the ground level.Spark Version used in the CourseThis Course is using the Apache Spark 3.x. I have tested all the source code and examples used in this Course on Apache Spark 3.0.0 open-source distribution.

Overview

Section 1: Apache Spark Introduction

Lecture 1 Big Data History and Primer

Lecture 2 Understanding the Data Lake Landscape

Lecture 3 What is Apache Spark – An Introduction and Overview

Lecture 4 Source Code and Other Resources

Section 2: Installing and Using Apache Spark

Lecture 5 Spark Development Environments

Lecture 6 Mac Users – Apache Spark in Local Mode Command Line REPL

Lecture 7 Windows Users – Apache Spark in Local Mode Command Line REPL

Lecture 8 Mac Users – Apache Spark in the IDE – PyCharm

Lecture 9 Windows Users – Apache Spark in the IDE – PyCharm

Lecture 10 Apache Spark in Cloud – Databricks Community and Notebooks

Lecture 11 Apache Spark in Anaconda – Jupyter Notebook

Section 3: Spark Execution Model and Architecture

Lecture 12 Execution Methods – How to Run Spark Programs?

Lecture 13 Spark Distributed Processing Model – How your program runs?

Lecture 14 Spark Execution Modes and Cluster Managers

Lecture 15 Summarizing Spark Execution Models – When to use What?

Lecture 16 Working with PySpark Shell – Demo

Lecture 17 Installing Multi-Node Spark Cluster – Demo

Lecture 18 Working with Notebooks in Cluster – Demo

Lecture 19 Working with Spark Submit – Demo

Lecture 20 Section Summary

Section 4: Spark Programming Model and Developer Experience

Lecture 21 Creating Spark Project Build Configuration

Lecture 22 Configuring Spark Project Application Logs

Lecture 23 Creating Spark Session

Lecture 24 Configuring Spark Session

Lecture 25 Data Frame Introduction

Lecture 26 Data Frame Partitions and Executors

Lecture 27 Spark Transformations and Actions

Lecture 28 Spark Jobs Stages and Task

Lecture 29 Understanding your Execution Plan

Lecture 30 Unit Testing Spark Application

Lecture 31 Rounding off Summary

Section 5: Spark Structured API Foundation

Lecture 32 Introduction to Spark APIs

Lecture 33 Introduction to Spark RDD API

Lecture 34 Working with Spark SQL

Lecture 35 Spark SQL Engine and Catalyst Optimizer

Lecture 36 Section Summary

Section 6: Spark Data Sources and Sinks

Lecture 37 Spark Data Sources and Sinks

Lecture 38 Spark DataFrameReader API

Lecture 39 Reading CSV, JSON and Parquet files

Lecture 40 Creating Spark DataFrame Schema

Lecture 41 Spark DataFrameWriter API

Lecture 42 Writing Your Data and Managing Layout

Lecture 43 Spark Databases and Tables

Lecture 44 Working with Spark SQL Tables

Section 7: Spark Dataframe and Dataset Transformations

Lecture 45 Introduction to Data Transformation

Lecture 46 Working with Dataframe Rows

Lecture 47 DataFrame Rows and Unit Testing

Lecture 48 Dataframe Rows and Unstructured data

Lecture 49 Working with Dataframe Columns

Lecture 50 Creating and Using UDF

Lecture 51 Misc Transformations

Section 8: Aggregations in Apache Spark

Lecture 52 Aggregating Dataframes

Lecture 53 Grouping Aggregations

Lecture 54 Windowing Aggregations

Section 9: Spark Dataframe Joins

Lecture 55 Dataframe Joins and column name ambiguity

Lecture 56 Outer Joins in Dataframe

Lecture 57 Internals of Spark Join and shuffle

Lecture 58 Optimizing your joins

Lecture 59 Implementing Bucket Joins

Section 10: Keep Learning

Lecture 60 Final Word

Lecture 61 Bonus Lecture : Get Extra

Software Engineers and Architects who are willing to design and develop a Bigdata Engineering Projects using Apache Spark,Programmers and developers who are aspiring to grow and learn Data Engineering using Apache Spark

Course Information:

Udemy | English | 6h 36m | 2.85 GB
Created by: Prashant Kumar Pandey

You Can See More Courses in the Developer >> Greetings from CourseDown.com

New Courses

Scroll to Top