Azure Databricks and Spark SQL Python
What you’ll learn
Azure Databricks
Data Lakehouse
Delta Lakes
Spark SQL
PySpark
Big Data
Real World Scenarios
Requirements
Basic SQL
Basic Python
Description
Databricks is one of the most in demand big data tools around. It is a fast, easy, and collaborative Spark based big data analytics service designed for data science, ML and data engineering workflows.The course is packed with lectures, code-along videos and dedicated challenge sections. This should be more than enough to keep you engaged and learning! As an added bonus you will also have lifetime access to all the lectures… and I have provided detailed notebooks as a downloadable asset, the notebooks will contain step by step documentation with additional resources and links.I have ensured that the delivery of the course is engaging and concise, the curriculum is extensive yet delivered in an efficient way. The course will provide you with hands-on training utilising a variety of different data sets.The course is aimed at teaching you PySpark, Spark SQL in Python and the Databricks Lakehouse Architecture.You will primarily be using Databricks on Microsoft Azure in addition to other services such as Azure Data Lake Storage Gen 2.The course will cover a variety of areas including:Set Up and OverviewAzure Databricks NotebooksSpark SQLReading and Writing DataData Analysis and Transformation with Spark SQL in PythonCharts and Dashboards in Databricks NotebooksDatabricks Medallion ArchitectureAccessing Data in Cloud Object StorageHive MetastoreDatabases, Tables and Views in DatabricksDelta Lake / Databricks Lakehouse Architecture
Overview
Section 1: Course Overview / Introduction to Spark and Databricks
Lecture 1 Course Introduction
Lecture 2 Big Data
Lecture 3 Hadoop, Spark and Databricks
Lecture 4 Apache Spark Architecture
Lecture 5 Spark vs Databricks Comparison
Lecture 6 Resource: Comparing Apache Spark vs Databricks
Section 2: Azure and Databricks Set Up
Lecture 7 Azure Account Set Up
Lecture 8 Azure UI Overview
Lecture 9 Resource: Azure Resources
Lecture 10 Creating your Databricks Service
Lecture 11 Databricks UI Overview
Lecture 12 Clusters
Lecture 13 Resource: Pricing, Cluster Pools and Runtime Versions
Lecture 14 How to use Databricks Notebooks
Lecture 15 User Interface Changes
Lecture 16 Mix Languages and add Markdown text in your Notebook
Lecture 17 Databricks Utilities Module and FileStore Utilities
Lecture 18 Resource: How to use Notebooks
Lecture 19 IMPORTANT – Download Course Resource Notebooks
Lecture 20 Cost Management and Cancelling your Subscription
Lecture 21 Resource: Cancelling your Azure Subscription
Section 3: Reading and Writing Data
Lecture 22 Dataset Download
Lecture 23 Databricks FileStore
Lecture 24 Resource: File Types
Lecture 25 Reading Data
Lecture 26 Writing Data
Lecture 27 Parquet Files
Lecture 28 Deleting Files and Folders
Section 4: Data Analysis and Transformation with SparkSQL
Lecture 29 Selecting and Renaming Columns
Lecture 30 Adding New Columns
Lecture 31 Changing Data Types
Lecture 32 Math Functions and Simple Arithmetic
Lecture 33 Sort Functions
Lecture 34 String Functions
Lecture 35 Datetime Functions
Lecture 36 Filtering DataFrames
Lecture 37 Conditional Statements
Lecture 38 Using SQL Expressions with expr()
Lecture 39 Removing Columns
Lecture 40 Grouping your DataFrame
Lecture 41 Pivot your DataFrame
Lecture 42 Joining DataFrames
Lecture 43 Union
Lecture 44 Unpivot your DataFrame
Lecture 45 Pandas
Section 5: Utilising the Medallion Architecture in Databricks
Lecture 46 Medallion Architecture
Lecture 47 Resource: Medallion Architecture
Section 6: Challenge Section: Customer Orders
Lecture 48 Dataset Download and DBFS Upload
Lecture 49 Assignment 1: Bronze to Silver
Lecture 50 Assignment 1 Solutions Walkthrough
Lecture 51 Assignment 2: Silver to Gold
Lecture 52 Assignment 2 Solutions Walkthrough
Section 7: Visualizations and Dashboards
Lecture 53 Visualizations and Dashboards
Section 8: Accessing Data from Azure Data Lake Storage (ADLS) with Databricks
Lecture 54 Creating an ADLS Gen2 Account
Lecture 55 (Optional) Storage Explorer
Lecture 56 Accessing via Access Keys
Lecture 57 Accessing via SAS Token
Lecture 58 Mounting ADLS to DBFS Overview
Lecture 59 Mounting ADLS to DBFS Demo
Lecture 60 Secret Scopes
Lecture 61 End to End Walkthrough Example
Section 9: Hive Metastore, Databases, Tables and Views
Lecture 62 Running SQL on DataFrames
Lecture 63 Hive Metastore and Creating Databases
Lecture 64 Managed Tables
Lecture 65 Specifying a Location for your Underlying Managed Table Data
Lecture 66 Unmanaged (External) Tables
Lecture 67 Permanent Views
Section 10: Challenge Section: Employees
Lecture 68 Dataset Download and ADLS Upload
Lecture 69 Assignment: Employees
Lecture 70 Assignment Solutions Walkthrough
Section 11: Databricks Data Lakehouse / Delta Lake
Lecture 71 Databricks Data Lakehouse / Delta Lake Overview
Lecture 72 Delta Lake Data Files
Lecture 73 Deleting and Updating Records
Lecture 74 Merge Into
Lecture 75 Table Utility Commands
Section 12: Modularize Code and Link Notebooks
Lecture 76 Running a Notebook from another Notebook
Lecture 77 Text Widgets
Section 13: Challenge Section: Health Updates
Lecture 78 Dataset Download and Overview
Lecture 79 Assignment 1 Overview
Lecture 80 Assignment 1 Solutions Walthrough
Lecture 81 Assignment 2 Overview (Difficult!)
Lecture 82 Assignment 2 Solutions Walkthrough
Section 14: Spark Structured Streaming and Auto Loader
Lecture 83 Spark Structured Streaming Overview
Lecture 84 ADLS Preparation for this Section
Lecture 85 Streaming Dataset “Simulator” Notebook
Lecture 86 Reading a Data Stream
Lecture 87 Reminder to Manually Cancel your Data Streams
Lecture 88 Writing to a Data Stream
Lecture 89 Additional Options
Lecture 90 Auto Loader
Section 15: Delta Live Tables
Lecture 91 Delta Live Overview
Lecture 92 Databricks Premium Resource Creation
Lecture 93 ADLS Preparation for this Section
Lecture 94 Demo 1: Live Tables
Lecture 95 Table Data and Pipeline Metadata
Lecture 96 Demo 2: Data Quality Checks
Lecture 97 Streaming Dataset “Simulator”
Lecture 98 Demo 3: Streaming Live Tables
Lecture 99 Demo 4: Additional Properties and Views
Section 16: Continue learning with me!
Lecture 100 BONUS: Check out my other courses
Anyone interested in working with Big Data and Spark,Anyone interested in working with Databricks,Anyone interested in working with cloud platforms
Course Information:
Udemy | English | 9h 6m | 4.68 GB
Created by: Malvik Vaghadia
You Can See More Courses in the Developer >> Greetings from CourseDown.com