Big Data with Apache Spark 3 and Python From Zero to Expert

Complete bootcamp to learn PySpark, Databricks, Spark Machine Learning, Advanced Analytics, Koalas and Spark Streaming
Big Data with Apache Spark 3 and Python From Zero to Expert
File Size :
2.06 GB
Total length :
5h 11m



Data Bootcamp


Last update




Big Data with Apache Spark 3 and Python From Zero to Expert

What you’ll learn

Introduction to Big Data and Apache Spark Fundamentals
Spark RDDs, Dataframes and Spark Koalas
Machine Learning with Spark
Advanced features with Apache Spark
Advanced analytics and data visualization toold
Spark in cloud with Azure and Databricks
Spark Streaming and GraphX
Machine learning in Databricks

Big Data with Apache Spark 3 and Python From Zero to Expert




If you are looking for a hands-on, complete and advanced course to learn Big Data with Apache Spark and Python, you have come to the right place.This course is designed to cover the complete skillset of Apache Spark, from RDDs, Spark SQL, Dataframes, and Spark Streaming, to Machine Learning with Spark ML, Advanced Analytics, data visualization, Spark Koalas, and Databricks.With lessons, downloadable study guides, hands-on exercises, and real-world use cases, this is the only course you’ll need to learn Apache Spark.Apache Spark has become the reference tool for Big Data, surpassing Hadoop MapReduce. Spark works up to 100 times faster than Hadoop MapReduce and has a complete ecosystem of functionalities for machine learning and data analytics. This makes Apache Spark one of the most in-demand skills for data engineers, data scientists, etc. Big Data is one of the most valuable skills today. So this course will teach you everything you need to position yourself in the Big Data job market.In this course we will teach you the complete skillset of Apache Spark and PySpark. Starting from the basics to the most advanced features. We will use visual presentations in Power Point, sharing clear explanations and useful professional advice.This course has the following sections:Introduction to big data and fundamentals of Apache SparkInstallation of Apache Spark and libraries such as Anaconda, Java, etc.Spark RDDsSpark DataframesAdvanced features with Apache SparkAdvanced analytics and data visualizationSpark KoalasMachine Learning with SparkSpark Streaming Spark GraphXDatabricksSpark in the cloud (Azure)If you’re ready to sharpen your skills, increase your career opportunities, and become a Big Data expert, join today and get immediate and lifetime access to:• Complete guide to Apache Spark (PDF e-book)• Downloadable Spark project files and code• Hands-on exercises and quizzes• Spark resources like: Cheatsheets and Summaries• 1 to 1 expert support• Course question and answer forum• 30 days money back guaranteeSee you there!


Section 1: Introduction to this course

Lecture 1 Introduction to this course

Lecture 2 How to get the most out of the course

Lecture 3 Course material

Section 2: Spark Fundamentals

Lecture 4 Spark Fundamentals

Lecture 5 Apache Spark execution

Lecture 6 Apache Spark ecosystem and documentation

Lecture 7 PySpark: operation, cluster administration and architecture

Section 3: Installing Apache Spark locally

Lecture 8 Download Spark, Java and Anaconda

Lecture 9 Setting environment variables

Lecture 10 Running Spark in Prompt and Jupyter Notebook

Lecture 11 Fixing common problems

Section 4: Spark Fundamentals and RDDs

Lecture 12 PySpark Cheat Sheet

Lecture 13 RDD Fundamentals

Lecture 14 Initialize PySpark with SparkSession and the SparkContext

Lecture 15 Transformations in RDDs like map, filter, flatMap and distinct

Lecture 16 Transformations in RDDs like reduceByKey, groupByKey or sortByKey

Lecture 17 RDD actions such as count, first, collect or take

Lecture 18 Practical exercise: Basic Features and RDDs

Section 5: Spark DataFrames and Apache Spark SQL

Lecture 19 PySpark Cheatsheet: SQL

Lecture 20 Fundamentals and advantages of DataFrames

Lecture 21 Characteristics of DataFrames and data sources

Lecture 22 Creating DataFrames in PySpark

Lecture 23 Operations with PySpark DataFrames

Lecture 24 Different types of joins in DataFrames

Lecture 25 SQL queries in PySpark

Lecture 26 Advanced features for loading and exporting data in PySpark

Lecture 27 Practical Exercise: Spark DataFrames and Apache Spark SQL

Section 6: Advanced features in Apache Spark

Lecture 28 Funciones avanzadas y optimización del rendimiento

Lecture 29 BroadCast Join and caching

Lecture 30 User Defined Functions (UDF) and advanced SQL functions

Lecture 31 Handling and imputation of missing values

Lecture 32 Partitioning and catalog of APIs

Section 7: Advanced Analytics with Apache Spark

Lecture 33 Introduction to advanced analytics with Spark

Lecture 34 Data loading and data schema modification

Lecture 35 Inspect data in PySpark

Lecture 36 Column transformation in PySpark

Lecture 37 Advanced missing data imputation in PySpark

Lecture 38 Data selection with PySpark and PySpark SQL

Lecture 39 Data visualization and graph generation in PySpark

Lecture 40 Persist data with PySpark

Lecture 41 Practical Exercise: Advanced Analytics with Apache Spark

Section 8: Koalas: The Apache Spark Pandas API

Lecture 42 Spark Koalas Fundamentals

Lecture 43 Feature Engineering with Koalas

Lecture 44 Creating DataFrames with Koalas

Lecture 45 Data manipulation and DataFrames with Koalas

Lecture 46 Working with missing data in Koalas

Lecture 47 Data visualization and graph generation with Koalas

Lecture 48 Importing and exporting data with Koalas

Lecture 49 Hands-on exercise with Koalas

Section 9: Machine Learning with Apache Spark

Lecture 50 Fundamentals of Machine Learning with Spark

Lecture 51 Spark Machine Learning Components

Lecture 52 Stages of developing a Machine Learning model

Lecture 53 Import data and exploratory data analysis (EDA)

Lecture 54 Machine Learning with Spark hands-on exercise

Lecture 55 Data preprocessing with PySpark

Lecture 56 Lab Exercise 2 Machine Learning with Spark

Lecture 57 Training the machine learning model in PySpark

Lecture 58 Evaluation of the Machine Learning model

Lecture 59 Lab Exercise 3 Machine Learning with Spark

Section 10: Spark Streaming

Lecture 60 Practical example of counting words with Spark Streaming

Lecture 61 Spark Streaming Configurations: Output Modes and Operation Types

Lecture 62 Time Window Operations in Spark Streaming

Lecture 63 Spark Streaming Capabilities

Lecture 64 Lab: Real-time bank fraud detection (Part I)

Lecture 65 Lab: Real-time bank fraud detection (Part II)

Lecture 66 Spark Streaming Exercise

Section 11: Introduction to Databricks

Lecture 67 Introduction to Databricks

Lecture 68 Databricks Terminology and Databricks Community

Lecture 69 Delta Lake

Lecture 70 Create a free Databricks account

Section 12: Apache Spark on Databricks

Lecture 71 Introduction to the Databricks environment

Lecture 72 Getting started with Databricks

Section 13: Databricks Platform

Lecture 73 Importing notebooks, language configuration and markdown

Lecture 74 Databricks File Dystem (DBFS)

Lecture 75 Create, manipulate and visualize tables

Lecture 76 Databricks widgets

Section 14: Spark DataFrame API

Lecture 77 Creating and saving DataFrames in Databricks

Lecture 78 Data transformation and visualization in Databricks

Lecture 79 Use case: Population data analytics

Section 15: Machine Learning in Databricks

Lecture 80 Import and exploratory analysis of the data

Lecture 81 Variable preprocessing with PySpark and Databricks

Lecture 82 Definition of the Machine Learning model and development of the Pipeline

Lecture 83 Model evaluation with PySpark and Databricks

Lecture 84 Hyperparameter tuning and registration in MLFlow

Lecture 85 Predictions with new data and visualization of the results

Section 16: Databricks Certification Preparation

Lecture 86 Why Apache Spark Certification?

Lecture 87 Certification topics

Lecture 88 Certification General information

Lecture 89 Preparation process

Lecture 90 Tips for passing exam in the first attempt

Lecture 91 Registration and Certification process

Lecture 92 Certification questions types

Lecture 93 How to obtain Databricks Certification for free

Section 17: Additional material

Lecture 94 Additional Resources: Complete Guide to Spark

Anyone who wants to learn advanced big data skills,Anyone who knows Python and wants to adquire Big Data processing skills,Anyone that want to make a career as a data engineer, data analyst or data scientist,Anyone interested in learning Apache Spark and Pyspark for Big Data analysis,Anyone that want to learn cutting-edge technology in Big Data

Course Information:

Udemy | English | 5h 11m | 2.06 GB
Created by: Data Bootcamp

You Can See More Courses in the IT & Software >> Greetings from

New Courses

Scroll to Top