Best Handson Big Data Practices with PySpark Spark Tuning

Semi-Structured (JSON), Structured and Unstructured Data Analysis with Spark and Python & Spark Performance Tuning
Best Handson Big Data Practices with PySpark Spark Tuning
File Size :
8.44 GB
Total length :
13h 1m

Category

Instructor

Amin Karami

Language

Last update

2/2023

Ratings

4.5/5

Best Handson Big Data Practices with PySpark Spark Tuning

What you’ll learn

Understand Apache Spark’s framework, execution and programming model for the development of Big Data Systems
Learn how to work with a free Cloud-based and a Desktop machine for Spark setup and configuration
Build simple to advanced Big Data applications for different types of data (volume, variety, veracity) through real case studies
Learn step-by-step hands-on PySpark practices on structured, unstructured and semi-structured data using RDD, DataFrame and SQL
Investigate and apply optimization and performance tuning methods to manage data Skewness and prevent Spill
Investigate and apply Adaptive Query Execution (AQE) to optimize Spark SQL query execution at runtime
Investigate and be able to explain the lazy evaluations (Narrow vs Wide transformation) and internal working of Spark
Build and learn Spark SQL applications using JDBC

Best Handson Big Data Practices with PySpark Spark Tuning

Requirements

Very basic Python and SQL
If you are new to Python programming, Don’t worry at all, you can learn it freely through my YouTube channel. Subscribe to my YouTube channel and keep learning without any hassle

Description

In this course, students will be provided with hands-on PySpark practices using real case studies from academia and industry to be able to work interactively with massive data. In addition, students will consider distributed processing challenges, such as data skewness and spill within big data processing. We designed this course for anyone seeking to master Spark and PySpark and Spread the knowledge of Big Data Analytics using real and challenging use cases.We will work with Spark RDD, DF, and SQL to process huge sized of data in the format of semi-structured, structured, and unstructured data. The learning outcomes and the teaching approach in this course will accelerate the learning by Identifying the most critical required skills in the industry and understanding the demands of Big Data analytics content.We will not only cover the details of the Spark engine for large-scale data processing, but also we will drill down big data problems that allow users to instantly shift from an overview of large-scale data to a more detailed and granular view using RDD, DF and SQL in real-life examples. We will walk through the Big Data case studies step by step to achieve the aim of this course.By the end of the course, you will be able to build Big Data applications for different types of data (volume, variety, veracity) and you will get acquainted with best-in-class examples of Big Data problems using PySpark.

Overview

Section 1: Introduction to Course

Lecture 1 Learn Hands-on Python on my YouTube Channel (free)

Lecture 2 We Would Like to Know What You Think!

Lecture 3 PySpark for Parallel Processing

Lecture 4 Spark Coding Environment

Lecture 5 PySpark Coding review using RDD (part_1)

Lecture 6 PySpark Coding review using RDD (part_2)

Lecture 7 PySpark Coding review using DF (part_1)

Lecture 8 PySpark Coding review using DF (part_2)

Section 2: PySpark for a large Semi-Structured (JSON) File

Lecture 9 JSON analysis using RDD

Lecture 10 JSON analysis using DF

Section 3: PySpark for a large Structured File

Lecture 11 Structured Data Analysis using RDD

Lecture 12 Structured Data Analysis using DF

Section 4: PySpark for a large Unstructured (LOG) File

Lecture 13 RDD for Log File Analysis

Section 5: Distributed Processing Challenges and Spark Performance Tuning

Lecture 14 Optimizing the Skewed Data in Spark (part_1)

Lecture 15 Optimizing the Skewed Data in Spark (part_2)

Lecture 16 Optimizing the Skewed Data in Spark (part_3)

Lecture 17 Spark Optimization for Better Performance (Prevent Spill)

Lecture 18 Spark Optimization using Adaptive Query Execution_1

Lecture 19 Spark Optimization using Adaptive Query Execution_2

Section 6: Additional Considerations

Lecture 20 Lazy Evaluations (Narrow vs Wide Transformation)

Lecture 21 How does Spark internally execute a program (job, stage, executor and task)?

Lecture 22 Spark SQL using JDBC (part_1)

Lecture 23 Spark SQL using JDBC (part_2)

Beginner/Junior/Senior Data Developers who want to master Spark/PySpark and Spread the knowledge of Big Data Analytics,If you are new to Python programming, Don’t worry at all, you can learn it freely through my YouTube channel. Subscribe to my YouTube channel and keep learning without any hassle

Course Information:

Udemy | English | 13h 1m | 8.44 GB
Created by: Amin Karami

You Can See More Courses in the IT & Software >> Greetings from CourseDown.com

New Courses

Scroll to Top