Azure Databricks and Spark SQL Python

Master Azure Databricks with PySpark: Your Hands-On Guide to Advanced Data Engineering and Analysis (DP203)
Azure Databricks and Spark SQL Python
File Size :
4.68 GB
Total length :
9h 6m

Category

Instructor

Malvik Vaghadia

Language

Last update

3/2023

Ratings

4.6/5

Azure Databricks and Spark SQL Python

What you’ll learn

Azure Databricks
Data Lakehouse
Delta Lakes
Spark SQL
PySpark
Big Data
Real World Scenarios

Azure Databricks and Spark SQL Python

Requirements

Basic SQL
Basic Python

Description

Databricks is one of the most in demand big data tools around. It is a fast, easy, and collaborative Spark based big data analytics service designed for data science, ML and data engineering workflows.The course is packed with lectures, code-along videos and dedicated challenge sections. This should be more than enough to keep you engaged and learning! As an added bonus you will also have lifetime access to all the lectures… and I have provided detailed notebooks as a downloadable asset, the notebooks will contain step by step documentation with additional resources and links.I have ensured that the delivery of the course is engaging and concise, the curriculum is extensive yet delivered in an efficient way. The course will provide you with hands-on training utilising a variety of different data sets.The course is aimed at teaching you PySpark, Spark SQL in Python and the Databricks Lakehouse Architecture.You will primarily be using Databricks on Microsoft Azure in addition to other services such as Azure Data Lake Storage Gen 2.The course will cover a variety of areas including:Set Up and OverviewAzure Databricks NotebooksSpark SQLReading and Writing DataData Analysis and Transformation with Spark SQL in PythonCharts and Dashboards in Databricks NotebooksDatabricks Medallion ArchitectureAccessing Data in Cloud Object StorageHive MetastoreDatabases, Tables and Views in DatabricksDelta Lake / Databricks Lakehouse Architecture

Overview

Section 1: Course Overview / Introduction to Spark and Databricks

Lecture 1 Course Introduction

Lecture 2 Big Data

Lecture 3 Hadoop, Spark and Databricks

Lecture 4 Apache Spark Architecture

Lecture 5 Spark vs Databricks Comparison

Lecture 6 Resource: Comparing Apache Spark vs Databricks

Section 2: Azure and Databricks Set Up

Lecture 7 Azure Account Set Up

Lecture 8 Azure UI Overview

Lecture 9 Resource: Azure Resources

Lecture 10 Creating your Databricks Service

Lecture 11 Databricks UI Overview

Lecture 12 Clusters

Lecture 13 Resource: Pricing, Cluster Pools and Runtime Versions

Lecture 14 How to use Databricks Notebooks

Lecture 15 User Interface Changes

Lecture 16 Mix Languages and add Markdown text in your Notebook

Lecture 17 Databricks Utilities Module and FileStore Utilities

Lecture 18 Resource: How to use Notebooks

Lecture 19 IMPORTANT – Download Course Resource Notebooks

Lecture 20 Cost Management and Cancelling your Subscription

Lecture 21 Resource: Cancelling your Azure Subscription

Section 3: Reading and Writing Data

Lecture 22 Dataset Download

Lecture 23 Databricks FileStore

Lecture 24 Resource: File Types

Lecture 25 Reading Data

Lecture 26 Writing Data

Lecture 27 Parquet Files

Lecture 28 Deleting Files and Folders

Section 4: Data Analysis and Transformation with SparkSQL

Lecture 29 Selecting and Renaming Columns

Lecture 30 Adding New Columns

Lecture 31 Changing Data Types

Lecture 32 Math Functions and Simple Arithmetic

Lecture 33 Sort Functions

Lecture 34 String Functions

Lecture 35 Datetime Functions

Lecture 36 Filtering DataFrames

Lecture 37 Conditional Statements

Lecture 38 Using SQL Expressions with expr()

Lecture 39 Removing Columns

Lecture 40 Grouping your DataFrame

Lecture 41 Pivot your DataFrame

Lecture 42 Joining DataFrames

Lecture 43 Union

Lecture 44 Unpivot your DataFrame

Lecture 45 Pandas

Section 5: Utilising the Medallion Architecture in Databricks

Lecture 46 Medallion Architecture

Lecture 47 Resource: Medallion Architecture

Section 6: Challenge Section: Customer Orders

Lecture 48 Dataset Download and DBFS Upload

Lecture 49 Assignment 1: Bronze to Silver

Lecture 50 Assignment 1 Solutions Walkthrough

Lecture 51 Assignment 2: Silver to Gold

Lecture 52 Assignment 2 Solutions Walkthrough

Section 7: Visualizations and Dashboards

Lecture 53 Visualizations and Dashboards

Section 8: Accessing Data from Azure Data Lake Storage (ADLS) with Databricks

Lecture 54 Creating an ADLS Gen2 Account

Lecture 55 (Optional) Storage Explorer

Lecture 56 Accessing via Access Keys

Lecture 57 Accessing via SAS Token

Lecture 58 Mounting ADLS to DBFS Overview

Lecture 59 Mounting ADLS to DBFS Demo

Lecture 60 Secret Scopes

Lecture 61 End to End Walkthrough Example

Section 9: Hive Metastore, Databases, Tables and Views

Lecture 62 Running SQL on DataFrames

Lecture 63 Hive Metastore and Creating Databases

Lecture 64 Managed Tables

Lecture 65 Specifying a Location for your Underlying Managed Table Data

Lecture 66 Unmanaged (External) Tables

Lecture 67 Permanent Views

Section 10: Challenge Section: Employees

Lecture 68 Dataset Download and ADLS Upload

Lecture 69 Assignment: Employees

Lecture 70 Assignment Solutions Walkthrough

Section 11: Databricks Data Lakehouse / Delta Lake

Lecture 71 Databricks Data Lakehouse / Delta Lake Overview

Lecture 72 Delta Lake Data Files

Lecture 73 Deleting and Updating Records

Lecture 74 Merge Into

Lecture 75 Table Utility Commands

Section 12: Modularize Code and Link Notebooks

Lecture 76 Running a Notebook from another Notebook

Lecture 77 Text Widgets

Section 13: Challenge Section: Health Updates

Lecture 78 Dataset Download and Overview

Lecture 79 Assignment 1 Overview

Lecture 80 Assignment 1 Solutions Walthrough

Lecture 81 Assignment 2 Overview (Difficult!)

Lecture 82 Assignment 2 Solutions Walkthrough

Section 14: Spark Structured Streaming and Auto Loader

Lecture 83 Spark Structured Streaming Overview

Lecture 84 ADLS Preparation for this Section

Lecture 85 Streaming Dataset “Simulator” Notebook

Lecture 86 Reading a Data Stream

Lecture 87 Reminder to Manually Cancel your Data Streams

Lecture 88 Writing to a Data Stream

Lecture 89 Additional Options

Lecture 90 Auto Loader

Section 15: Delta Live Tables

Lecture 91 Delta Live Overview

Lecture 92 Databricks Premium Resource Creation

Lecture 93 ADLS Preparation for this Section

Lecture 94 Demo 1: Live Tables

Lecture 95 Table Data and Pipeline Metadata

Lecture 96 Demo 2: Data Quality Checks

Lecture 97 Streaming Dataset “Simulator”

Lecture 98 Demo 3: Streaming Live Tables

Lecture 99 Demo 4: Additional Properties and Views

Section 16: Continue learning with me!

Lecture 100 BONUS: Check out my other courses

Anyone interested in working with Big Data and Spark,Anyone interested in working with Databricks,Anyone interested in working with cloud platforms

Course Information:

Udemy | English | 9h 6m | 4.68 GB
Created by: Malvik Vaghadia

You Can See More Courses in the Developer >> Greetings from CourseDown.com

New Courses

Scroll to Top