Apache Spark 3 Spark Programming in Python for Beginners
What you’ll learn
Apache Spark Foundation and Spark Architecture
Data Engineering and Data Processing in Spark
Working with Data Sources and Sinks
Working with Data Frames and Spark SQL
Using PyCharm IDE for Spark Development and Debugging
Unit Testing, Managing Application Logs and Cluster Deployment
Requirements
Programming Knowledge Using Python Programming Language
A Recent 64-bit Windows/Mac/Linux Machine with 8 GB RAM
Description
This course does not require any prior knowledge of Apache Spark or Hadoop. We have taken enough care to explain Spark Architecture and fundamental concepts to help you come up to speed and grasp the content of this course.About the CourseI am creating Apache Spark 3 – Spark Programming in Python for Beginners course to help you understand the Spark programming and apply that knowledge to build data engineering solutions. This course is example-driven and follows a working session like approach. We will be taking a live coding approach and explain all the needed concepts along the way.Who should take this Course?I designed this course for software engineers willing to develop a Data Engineering pipeline and application using the Apache Spark. I am also creating this course for data architects and data engineers who are responsible for designing and building the organization’s data-centric infrastructure. Another group of people is the managers and architects who do not directly work with Spark implementation. Still, they work with the people who implement Apache Spark at the ground level.Spark Version used in the CourseThis Course is using the Apache Spark 3.x. I have tested all the source code and examples used in this Course on Apache Spark 3.0.0 open-source distribution.
Overview
Section 1: Apache Spark Introduction
Lecture 1 Big Data History and Primer
Lecture 2 Understanding the Data Lake Landscape
Lecture 3 What is Apache Spark – An Introduction and Overview
Lecture 4 Source Code and Other Resources
Section 2: Installing and Using Apache Spark
Lecture 5 Spark Development Environments
Lecture 6 Mac Users – Apache Spark in Local Mode Command Line REPL
Lecture 7 Windows Users – Apache Spark in Local Mode Command Line REPL
Lecture 8 Mac Users – Apache Spark in the IDE – PyCharm
Lecture 9 Windows Users – Apache Spark in the IDE – PyCharm
Lecture 10 Apache Spark in Cloud – Databricks Community and Notebooks
Lecture 11 Apache Spark in Anaconda – Jupyter Notebook
Section 3: Spark Execution Model and Architecture
Lecture 12 Execution Methods – How to Run Spark Programs?
Lecture 13 Spark Distributed Processing Model – How your program runs?
Lecture 14 Spark Execution Modes and Cluster Managers
Lecture 15 Summarizing Spark Execution Models – When to use What?
Lecture 16 Working with PySpark Shell – Demo
Lecture 17 Installing Multi-Node Spark Cluster – Demo
Lecture 18 Working with Notebooks in Cluster – Demo
Lecture 19 Working with Spark Submit – Demo
Lecture 20 Section Summary
Section 4: Spark Programming Model and Developer Experience
Lecture 21 Creating Spark Project Build Configuration
Lecture 22 Configuring Spark Project Application Logs
Lecture 23 Creating Spark Session
Lecture 24 Configuring Spark Session
Lecture 25 Data Frame Introduction
Lecture 26 Data Frame Partitions and Executors
Lecture 27 Spark Transformations and Actions
Lecture 28 Spark Jobs Stages and Task
Lecture 29 Understanding your Execution Plan
Lecture 30 Unit Testing Spark Application
Lecture 31 Rounding off Summary
Section 5: Spark Structured API Foundation
Lecture 32 Introduction to Spark APIs
Lecture 33 Introduction to Spark RDD API
Lecture 34 Working with Spark SQL
Lecture 35 Spark SQL Engine and Catalyst Optimizer
Lecture 36 Section Summary
Section 6: Spark Data Sources and Sinks
Lecture 37 Spark Data Sources and Sinks
Lecture 38 Spark DataFrameReader API
Lecture 39 Reading CSV, JSON and Parquet files
Lecture 40 Creating Spark DataFrame Schema
Lecture 41 Spark DataFrameWriter API
Lecture 42 Writing Your Data and Managing Layout
Lecture 43 Spark Databases and Tables
Lecture 44 Working with Spark SQL Tables
Section 7: Spark Dataframe and Dataset Transformations
Lecture 45 Introduction to Data Transformation
Lecture 46 Working with Dataframe Rows
Lecture 47 DataFrame Rows and Unit Testing
Lecture 48 Dataframe Rows and Unstructured data
Lecture 49 Working with Dataframe Columns
Lecture 50 Creating and Using UDF
Lecture 51 Misc Transformations
Section 8: Aggregations in Apache Spark
Lecture 52 Aggregating Dataframes
Lecture 53 Grouping Aggregations
Lecture 54 Windowing Aggregations
Section 9: Spark Dataframe Joins
Lecture 55 Dataframe Joins and column name ambiguity
Lecture 56 Outer Joins in Dataframe
Lecture 57 Internals of Spark Join and shuffle
Lecture 58 Optimizing your joins
Lecture 59 Implementing Bucket Joins
Section 10: Keep Learning
Lecture 60 Final Word
Lecture 61 Bonus Lecture : Get Extra
Software Engineers and Architects who are willing to design and develop a Bigdata Engineering Projects using Apache Spark,Programmers and developers who are aspiring to grow and learn Data Engineering using Apache Spark
Course Information:
Udemy | English | 6h 36m | 2.85 GB
Created by: Prashant Kumar Pandey
You Can See More Courses in the Developer >> Greetings from CourseDown.com