Data Engineering Essentials using SQL Python and PySpark
What you’ll learn
Setup Development Environment to learn building Data Engineering Applications on GCP
Database Essentials for Data Engineering using Postgres such as creating tables, indexes, running SQL Queries, using important pre-defined functions, etc.
Data Engineering Programming Essentials using Python such as basic programming constructs, collections, Pandas, Database Programming, etc.
Data Engineering using Spark Dataframe APIs (PySpark). Learn all important Spark Data Frame APIs such as select, filter, groupBy, orderBy, etc.
Data Engineering using Spark SQL (PySpark and Spark SQL). Learn how to write high quality Spark SQL queries using SELECT, WHERE, GROUP BY, ORDER BY, ETC.
Relevance of Spark Metastore and integration of Dataframes and Spark SQL
Ability to build Data Engineering Pipelines using Spark leveraging Python as Programming Language
Use of different file formats such as Parquet, JSON, CSV etc in building Data Engineering Pipelines
Setup self support single node Hadoop and Spark Cluster to get enough practice on HDFS and YARN
Understanding Complete Spark Application Development Life Cycle to build Spark Applications using Pyspark. Review the applications using Spark UI.
Requirements
Laptop with decent configuration (Minimum 4 GB RAM and Dual Core)
Sign up for GCP with the available credit or AWS Access
Setup self support lab on cloud platforms (you might have to pay the applicable cloud fee unless you have credit)
CS or IT degree or prior IT experience is highly desired
Description
As part of this course, you will learn all the Data Engineering Essentials related to building Data Pipelines using SQL, Python as Hadoop, Hive, or Spark SQL as well as PySpark Data Frame APIs. You will also understand the development and deployment lifecycle of Python applications using Docker as well as PySpark on multinode clusters. You will also gain basic knowledge about reviewing Spark Jobs using Spark UI.About Data EngineeringData Engineering is nothing but processing the data depending on our downstream needs. We need to build different pipelines such as Batch Pipelines, Streaming Pipelines, etc as part of Data Engineering. All roles related to Data Processing are consolidated under Data Engineering. Conventionally, they are known as ETLÂ Development, Data Warehouse Development, etc.Here are some of the challenges the learners have to face to learn key Data Engineering Skills such as Python, SQL, PySpark, etc.Having an appropriate environment with Apache Hadoop, Apache Spark, Apache Hive, etc working together.Good quality content with proper support.Enough tasks and exercises for practiceThis course is designed to address these key challenges for professionals at all levels to acquire the required Data Engineering Skills (Python, SQL, and Apache Spark).To make sure you spend time learning rather than struggling with technical challenges, here is what we have done.Training using an interactive environment. You will get 2 weeks of lab access, to begin with. If you like the environment and acknowledge it by providing ratings and feedback, the lab access will be extended to additional 6 weeks (2 months). Feel free to send an email to [email protected] to get complementary lab access. Also, if your employer provides a multi-node environment, we will help you set up the material for the practice as part of the live session. On top of Q&A Support, we also provide required support via live sessions.Make sure we have a system with the right configuration and quickly set up a lab using Docker with all the required Python, SQL, Pyspark as well as Spark SQLÂ material. It will address a lot of pain points related to networking, database integration, etc. Feel free to reach out to us via Udemy Q&A, in case you struck at the time of setting up the environment.You will start with foundational skills such as Python as well as SQL using a Jupyter-based environment. Most of the lecturers have quite a few tasks and also at the end of each and every module, there are enough exercises or practice tests to evaluate the skills taught.Once you are comfortable with programming using Python and SQL, then you will ensure you understand how to quickly set up and access Single Node Hadoop and Spark Cluster.The content is streamlined in such a way that, you use learner-friendly interfaces such as Jupyter Lab to practice them.If you end up signing up for the course do not forget to rate us 5* if you like the content. If not, feel free to reach out to us and we will address your concerns.Highlights of this courseHere are some of the highlights of this Data Engineering course using technologies such as Python, SQL, Hadoop, Spark, etc.The course is designed by 20+ years of experienced veteran (Durga Gadiraju) with most of his experience around data. He has more than a decade of Data Engineering as well as Big Data experience with several certifications. He has a history of training hundreds of thousands of IT professionals in Data Engineering as well as Big Data.Simplified setup of all the key tools to learn Data Engineering or Big Data such as Hadoop, Spark, Hive, etc.Dedicated support where 100% of questions are answered in the past few months.Tons of material with real-world experiences and Data Sets. The material is made available both under the Git repository as well as in the lab which you are going to set up.Complementary Lab Access for 2 Weeks which can be extended to 8 Weeks.30 Day Money back guarantee.Content DetailsAs part of this course, you will be learning Data Engineering Essentials such as SQL, and Programming using Python and Apache Spark. Here is the detailed agenda for the course.Data Engineering Labs – Python and SQLYou will start with setting up self-support Data Engineering Labs on Cloud9 or on your Mac or PC so that you can learn the key skills related to Data Engineering with a lot of practice leveraging tasks and exercises provided by us. As you pass the sections related to SQLÂ and Python, you will also be guided to set up Hadoop and Spark Lab.Provision AWS Cloud9 Instance (in case your Mac or PC does not have enough capacity)Setup Docker Compose to start the containers to learn Python and SQL (using Postgresql)Access the material via Jupyter Lab environment setup using Docker and learn via hands-on practice.Once the environment is set up, the material will be directly accessible.Database Essentials – SQLÂ using PostgresIt is important for one to be proficient with SQLÂ to take care of building data engineering pipelines. SQLÂ is used for understanding the data, performing ad-hoc analysis, and also in building data engineering pipelines.Getting Started with PostgresBasic Database Operations (CRUD or Insert, Update, Delete)Writing Basic SQL Queries (Filtering, Joins, and Aggregations)Creating Tables and Indexes using Postgres DDLÂ CommandsPartitioning Tables and Indexes using Postgres DDLÂ CommandsPredefined Functions using SQL (String Manipulation, Date Manipulation, and other functions)Writing Advanced SQL Queries using PostgresqlProgramming Essentials using PythonPython is the most preferred programming language to develop data engineering applications. As part of several sections related to Python, you will be learning most of the important aspects of Python to build data engineering applications effectively.Perform Database OperationsGetting Started with PythonBasic Programming Constructs in Python (for loops, if conditions)Predefined Functions in Python (string manipulation, date manipulation, and other standard functions)Overview of Collections such as list and set in PythonOverview of Collections such as dict and tuple in PythonManipulating Collections using loops in Python. This is primarily designed to get enough practice with Python Programming around Python Collections.Understanding Map Reduce Libraries in Python. You will learn functions such as map, filter, etc. You will also understand details about itertools.Overview of Python Pandas Libraries. You will be learning about how to read from files, and processing the data in Pandas Data Frame by applying Standard Transformations such as filtering, joins, sorting, etc. Also, you’ll be learning how to write data to files. Database Programming using Python – CRUD OperationsDatabase Programming using Python – Batch Operations. There will be enough emphasis on best practices to load data into Databases in bulk or batches.Setting up Single Node Data Engineering Cluster for PracticeThe most common approach to building data engineering applications at scale is by using Apache Spark integrated with HDFS and YARN. Before getting into data engineering using Apache Spark and Hadoop, we need to set up an environment to practice data engineering using Apache Spark. As part of this section, we will primarily focus on setting up a single node cluster to learn key skills related to data engineering using distributed frameworks such as Apache Spark and Apache Hadoop.We have simplified the complex tasks of setting up Apache Hadoop, Apache Hive, and Apache Spark leveraging Docker. Within an hour without running into too many technical issues, you will be able to set up the cluster. However, if you run into any issues, feel free to reach out to us and we will help you to overcome the challenges.Master required Hadoop Skills to build Data Engineering ApplicationsAs part of this section, you will primarily focus on HDFSÂ commands so that we can copy files into HDFS. The data copied into HDFS will be used as part of building data engineering pipelines using Spark and Hadoop with Python as a Programming Language.Overview of HDFSÂ CommandsCopy Files into HDFSÂ using put or copyFromLocal command using appropriate HDFSÂ CommandsReview whether the files are copied properly or not to HDFSÂ using HDFSÂ Commands.Get the size of the files using HDFS commands such as du, df, etc.Some fundamental concepts related to HDFS such as block size, replication factor, etc.Data Engineering using Spark SQLLet us, deep-dive into Spark SQL to understand how it can be used to build Data Engineering Pipelines. Spark with SQLÂ will provide us the ability to leverage distributed computing capabilities of Spark coupled with easy-to-use developer-friendly SQL-style syntax.Getting Started with Spark SQLBasic Transformations using Spark SQLManaging Tables – Basic DDL and DML in Spark SQLManaging Tables – DML and Create Partitioned Tables using Spark SQLOverview of Spark SQL Functions to manipulate strings, dates, null values, etcWindowing Functions using Spark SQL for ranking, advanced aggregations, etc.Data Engineering using Spark Data Frame APIsSpark Data Frame APIs are an alternative way of building Data Engineering applications at scale leveraging distributed computing capabilities of Apache Spark. Data Engineers from application development backgrounds might prefer Data Frame APIs over Spark SQL to build Data Engineering applications.Data Processing Overview using Spark or Pyspark Data Frame APIs.Projecting or Selecting data from Spark Data Frames, renaming columns, providing aliases, dropping columns from Data Frames, etc using Pyspark Data Frame APIs.Processing Column Data using Spark or Pyspark Data Frame APIs – You will be learning functions to manipulate strings, dates, null values, etc.Basic Transformations on Spark Data Frames using Pyspark Data Frame APIs such as Filtering, Aggregations, and Sorting using functions such as filter/where, groupBy with agg, sort or orderBy, etc.Joining Data Sets on Spark Data Frames using Pyspark Data Frame APIs such as join. You will learn inner joins, outer joins, etc using the right examples.Windowing Functions on Spark Data Frames using Pyspark Data Frame APIs to perform advanced Aggregations, Ranking, and Analytic FunctionsSpark Metastore Databases and Tables and integration between Spark SQLÂ and Data Frame APIsDevelopment, Deployment as well as Execution Life Cycle of Spark ApplicationsOnce you go through the content related to Apache Spark using a Jupyter-based environment, we will also walk you through the details about how the Spark applications are typically developed using Python, deployed as well as reviewed.Setup Python Virtual Environment and Project for Spark Application Development using PycharmUnderstand complete Spark Application Development Lifecycle using Pycharm and PythonBuild a zip file for the Spark Application, copy it to the environment where it is supposed to run, and run.Understand how to review the Spark Application Execution Life Cycle.Desired Audience for this Data Engineering Essentials coursePeople from different backgrounds can aim to become Data Engineers. We cover most of the Data Engineering essentials for the aspirants who want to get into the IT field as Data Engineers as well as professionals who want to propel their career toward Data Engineering from legacy technologies.College students and entry-level professionals to get hands-on expertise with respect to Data Engineering. This course will provide enough skills to face interviews for entry-level data engineers.Experienced application developers to gain expertise related to Data Engineering.Conventional Data Warehouse Developers, ETLÂ Developers, Database Developers, and PL/SQLÂ Developers to gain enough skills to transition to being successful Data Engineers.Testers to improve their testing capabilities related to Data Engineering applications.Other hands-on IT Professional who wants to get knowledge about Data Engineering with Hands-On Practice.Prerequisites to practice Data Engineering SkillsHere are the prerequisites for someone who wants to be a Data Engineer.LogisticsComputer with decent configuration (At least 4 GB RAM, however 8 GBÂ is highly desired). However, this will not suffice if you do not have a multi-node cluster. We will walk you through the cheaper options to set up the environment and practice.Dual Core is required and Quad-Core is highly desiredChrome BrowserHigh-Speed InternetDesired BackgroundEngineering or Science DegreeAbility to use computerKnowledge or working experience with databases and any programming language is highly desiredTraining Approach for learning required Data Engineering SkillsHere are the details related to the training approach for you to master all the key Data Engineering Skills to propel your career toward Data Engineering.It is self-paced with reference material, code snippets, and videos provided as part of Udemy.One can either use the environment provided by us or set up their own environment using Docker on AWSÂ or GCP or the platform of their choice.We would recommend completing 2 modules every week by spending 4 to 5 hours per week.It is highly recommended to take care of the exercises at the end to ensure that you are able to meet all the key objectives for each module.Support will be provided through Udemy Q&A.The course is designed in such a way that one can self-evaluate through the course and confirm whether the skills are acquired.Here is the approach we recommend you to take this course.The course is hands-on with thousands of tasks, you should practice as you go through the course.You should also spend time understanding the concepts. If you do not understand the concept, I would recommend moving on and coming back later to the topic.Go through the consolidated exercises and see if you are able to solve the problems or not.Make sure to follow the order we have defined as part of the course.After each and every section or module, make sure to solve the exercises. We have provided enough information to validate the output.By the end of the course, then you can come to the conclusion that you are able to master essential skills related to SQL, Python, and Apache Spark.
Overview
Section 1: Introduction about the course
Lecture 1 Introduction about course
Lecture 2 Desired Audience
Lecture 3 Pre-requisites
Lecture 4 [Must Watch] 30 Day Money Back Guarantee – Feedback and Rating
Lecture 5 Training Approach
Lecture 6 Overview of Environment for Hands on Practice
Lecture 7 How to access data sets used in this course?
Section 2: Getting Started with ITVersity Labs for Data Engineering Essentials on Udemy
Lecture 8 Introduction to Getting Started with ITVersity Labs and Udemy
Lecture 9 Logging in into the ITVersity Python and Data Engineering Lab
Lecture 10 Setup Data Engineering Material from GitHub
Lecture 11 Overview of ITVersity Labs and Udemy
Lecture 12 Overview of Jupyter Lab Environment
Lecture 13 Using Jupyter Lab Sidebar to Navigate through the content
Lecture 14 Understanding Jupyter Launcher
Lecture 15 Creating Jupyter Notebooks and Overview of Kernels
Lecture 16 Managing Tabs and Kernels using Jupyter Lab Environment
Lecture 17 Overview of Jupyter Notebooks and Cells
Lecture 18 Running Shell Commands using Jupyter Notebook
Lecture 19 Getting Information to Connect to Databases to run queries
Lecture 20 Running SQL Queries using Jupyter Notebooks
Section 3: Setup Environment to learn Python, SQL, Hadoop, Spark using Docker on Windows 11
Lecture 21 Setup Environment using Docker on Windows 11 – Introduction
Lecture 22 Understanding System Configuration of Windows 11 PC
Lecture 23 Steps to setup Docker Desktop on Windows 11
Lecture 24 Enable WSL2 on Windows 11 by installing Ubuntu VM using WSL
Lecture 25 Install Linux Kernel Update Package on Windows 11 for Docker Desktop
Lecture 26 Download and Install Docker Desktop on Windows 11
Lecture 27 Validating git using WSL Ubuntu on Windows 11
Lecture 28 Clone Data Engineering Essentials Material on Windows 11
Lecture 29 Start Python and SQL Containers using docker-compose command on Windows 11
Lecture 30 Download and Install Pycharm on Windows 11
Lecture 31 Setup Pycharm Project for Data Engineering
Lecture 32 Review Docker Compose File for Data Engineering Essentials Material
Lecture 33 Review important Docker Compose Commands to manage services
Lecture 34 Access Jupyter Based Environment to learn Python and SQL
Lecture 35 Getting Jupyter Lab Token to login into Jupyter Lab
Section 4: Setup Environment to learn Python, SQL, Hadoop, Spark using Docker on Windows 10
Lecture 36 Understanding System Configuration
Lecture 37 Setup Docker Desktop on Windows
Lecture 38 Validate Docker on Windows using Command Line leveraging Power Shell
Lecture 39 Review Docker Desktop Resource Configurations
Lecture 40 Clone GitHub Repository on Windows
Lecture 41 Setup Pycharm Project for Data Engineering Essentials
Lecture 42 Update Git Global Settings related to Line Endings
Lecture 43 Review Services Docker Compose
Lecture 44 Start Python and SQL Environment using Docker Compose
Lecture 45 Review resource utilization after setting up Python and SQL Environment
Lecture 46 Access Jupyter Based Environment to learn Python
Lecture 47 Getting Jupyter Lab Token to login into Jupyter Lab
Section 5: Setup Environment to learn Python, SQL, Hadoop and Spark using Docker on Mac
Lecture 48 Setup Environment using Mac
Lecture 49 Setup Docker Desktop on Mac
Lecture 50 Validate Docker Setup on Mac
Lecture 51 Review Memory and CPU Settings of Docker Desktop for Mac
Lecture 52 Configure Docker Desktop for Data Engineering Essentials Environment
Lecture 53 Clone GitHub Repository for Data Engineering Essentials
Lecture 54 Setup as Pycharm Project to review the files using IDE
Lecture 55 Review Docker Compose file for Python and SQL Lab
Lecture 56 Start Python and SQL Environment using Docker Compose
Lecture 57 Review resource utilization after setting up Python and SQL Environment
Lecture 58 Access Jupyter Based Environment to learn Python
Lecture 59 Getting Jupyter Lab Token to login into Jupyter Lab
Section 6: Setting up Environment to learn Python, SQL as well as Spark using AWS Cloud9
Lecture 60 Getting Started with Cloud9
Lecture 61 Creating Cloud9 Environment
Lecture 62 Warming up with Cloud9 IDE
Lecture 63 Details about material to setup postgres database using docker
Lecture 64 Overview of EC2 related to Cloud9
Lecture 65 Opening ports for Cloud9 Instance
Lecture 66 Associating Elastic IPs to Cloud9 Instance
Lecture 67 Increase EBS Volume Size of Cloud9 Instance
Lecture 68 Setup Docker Compose on AWS Cloud9 Instance
Lecture 69 Clone GitHub Repository
Lecture 70 Setup Python and SQL Environment using Docker Compose
Lecture 71 Update Inbound Rules of AWS EC2 Security Group
Lecture 72 Login into the Jupyter based environment
Section 7: Networking Concepts for Beginners – ip addresses and port numbers
Lecture 73 Enable telnet on Windows
Lecture 74 Different IP Address Types
Lecture 75 Port Numbers associated with Applications or Services
Lecture 76 Reverting port for SSH to default port number
Lecture 77 Setup Apache2 on Ubuntu
Lecture 78 Overview of localhost
Lecture 79 Overview of Private IP Address associated with a server
Lecture 80 Overview of Public IP Address associated with a server
Lecture 81 Setup Web Application and access using local ip
Lecture 82 Setup Web Application and access using private ip
Lecture 83 Disable Access to Web Application using Public ip
Lecture 84 Install sshuttle on Mac using brew
Lecture 85 Access Web Application using Private IP using SSH as proxy
Section 8: Database Essentials – Getting Started
Lecture 86 Setup SMS Database using Postgres
Lecture 87 Connecting to Postgresql Database
Lecture 88 Using psql to interact with Postgresql Database using CLI
Lecture 89 Data Loading Utilities in Postgresql
Section 9: Database Essentials – Database Operations
Lecture 90 Database Operations – Overview
Lecture 91 Database CRUD Operations
Lecture 92 Creating Table in Postgres Database
Lecture 93 Inserting Data into Postgres Database Table
Lecture 94 Updating Data in Postgres Database Table
Lecture 95 Deleting Data in Postgres Database Table
Lecture 96 Overview of Database Transactions
Lecture 97 Exercise – DML or CRUD Operations using Postgresql
Section 10: Database Essentials – Writing Basic SQL Queries
Lecture 98 Standard Transformations
Lecture 99 Overview of Data Model
Lecture 100 Define Problem Statement
Lecture 101 Preparing Database Tables using Postgres
Lecture 102 Selecting or Projecting Data from Postgres Database Tables using SQL
Lecture 103 Filtering Data from Postgres Database Tables using SQL
Lecture 104 Joining Postgres Database Tables using SQL – Inner
Lecture 105 Joining Postgres Database Tables using SQL – Outer
Lecture 106 Performing Aggregations using SQL on Postgres Database Tables
Lecture 107 Sorting Data in Postgres Tables using SQL
Lecture 108 Solution – Daily Product Revenue using SQL on Postgres Database Tables
Lecture 109 Exercises – Writing Basic SQL Queries on Postgres Database Tables
Section 11: Database Essentials – Creating Tables and Indexes
Lecture 110 DDL – Data Definition Language
Lecture 111 Overview of Data Types used while creating Postgres Database Tables
Lecture 112 Adding or Modifying Columns using Alter in Postgres Database Tables
Lecture 113 Different Type of Constraints used on Database Tables
Lecture 114 Managing Constraints on Postgres Database Tables
Lecture 115 Indexes on Postgres Database Tables
Lecture 116 Indexes for Constraints on Postgres Database Tables
Lecture 117 Overview of Sequences used on Postgres Database Tables
Lecture 118 Truncating Postgres Database Tables
Lecture 119 Dropping Postgres Database Tables
Lecture 120 Exercises and Solutions – Managing Database Objects using Postgresql
Section 12: Database Essentials – Partitioning Tables and Indexes
Lecture 121 Overview of Partitioning of Postgres Database Tables
Lecture 122 List Partitioning of Database Tables
Lecture 123 Managing Partitions of Postgres Database Tables – List
Lecture 124 Manipulating Data in Postgres Database Partitioned Tables
Lecture 125 Range Partitioning of Postgres Database Tables
Lecture 126 Managing Partitions of Postgres Database Tables – Range
Lecture 127 Repartitioning of Postgres Database Tables – Range
Lecture 128 Hash Partitioning of Postgres Database Tables
Lecture 129 Managing Partitions of Postgres Database Tables – Hash
Lecture 130 Usage Scenarios of Database Partitioned Tables
Lecture 131 Sub Partitioning of Postgres Database Tables
Lecture 132 Exercise – Partitioned Tables of Postgres Database Tables
Section 13: Database Essentials – Predefined Functions
Lecture 133 Overview of SQL Functions in Postgres
Lecture 134 String Manipulation Functions in SQL using Postgres
Lecture 135 Case Conversion and Length using Functions in SQL using Postgres
Lecture 136 Extracting Data – Using substr and split_part Functions in SQL using Postgres
Lecture 137 Using position or strpos Functions in SQL using Postgres
Lecture 138 Trimming and Padding Functions in SQL using Postgres
Lecture 139 Reverse and Concatenate Multiple Strings using Functions in SQL using Postgres
Lecture 140 String Replacement using Functions in SQL using Postgres
Lecture 141 Date Manipulation Functions using SQL in Postgres
Lecture 142 Getting Current Date or Timestamp using Functions in SQL using Postgres
Lecture 143 Date Arithmetic using Functions in SQL using Postgres
Lecture 144 Beginning Date or Time using date_trunc Function in SQL using Postgres
Lecture 145 Using to_char and to_date Functions in SQL using Postgres
Lecture 146 Extracting Information using extract Function in SQL using Postgres
Lecture 147 Dealing with Unix Timestamp or epoch using Functions in SQL using Postgres
Lecture 148 Overview of Numeric Functions using SQL in Postgres
Lecture 149 Data Type Conversion using Functions in SQL using Postgres
Lecture 150 Handling NULL Values using SQL in Postgres
Lecture 151 Using CASE and WHEN as part of SQL in Postgres
Section 14: Database Essentials – Writing Advanced SQL Queries
Lecture 152 Overview of Database Views using Postgres Database
Lecture 153 Overview of Named Queries using SQL in Postgres
Lecture 154 Overview of Sub Queries using SQL in Postgres
Lecture 155 CTAS – Create Table As Select using Postgres
Lecture 156 Advanced DML Operations on Postgres Database Tables
Lecture 157 Merging or Upserting Data into Postgres Database Tables
Lecture 158 Pivoting Rows into Columns using SQL in Postgres
Lecture 159 Overview of Analytic Functions using SQL in Postgres
Lecture 160 Analytic Functions – Aggregations using SQL in Postgres
Lecture 161 Cumulative or Moving Aggregations using SQL in Postgres
Lecture 162 Analytic Functions using SQL in Postgres – Windowing
Lecture 163 Analytic Functions using SQL in Postgres – Ranking
Lecture 164 Analytic Functions using SQL in Postgres – Filtering
Lecture 165 Ranking and Filtering using SQL in Postgres – Recap
Lecture 166 Exercises – Writing Advanced Queries
Section 15: Programming Essentials using Python – Perform Database Operations
Lecture 167 Introduction – Perform Database Operations
Lecture 168 Overview of SQL
Lecture 169 Create Database and Users Table
Lecture 170 DDL – Data Definition Language
Lecture 171 DML – Data Manipulation Language
Lecture 172 DQL – Data Query Language
Lecture 173 CRUD Operations – DML and DQL
Lecture 174 TCL – Transaction Control Language
Lecture 175 Example – Data Engineering
Lecture 176 Example – Web Application
Lecture 177 Exercise – Database Operations
Section 16: Programming Essentials using Python – Getting Started with Python
Lecture 178 Installing Python on Windows
Lecture 179 Overview of Anaconda
Lecture 180 Python CLI and Jupyter Notebook
Lecture 181 Overview of Jupyter Lab
Lecture 182 Using IDEs – Pycharm
Lecture 183 Using Visual Studio Code
Lecture 184 Using ITVersity Labs
Lecture 185 Leveraging Google Colab
Section 17: Programming Essentials using Python – Basic Programming Constructs
Lecture 186 Basic Programming Constructs using Python – Introduction
Lecture 187 Getting Help using help function in Python
Lecture 188 Python Variables and Objects
Lecture 189 Python Data Types – Commonly Used
Lecture 190 Operators in Python
Lecture 191 Tasks – Data Types and Operators using Python
Lecture 192 Developing Conditionals using Python
Lecture 193 All about for loops in Python
Lecture 194 Running os commands in Python
Lecture 195 Exercises – Basic Programming Constructs using Python
Lecture 196 Dynamic Arithmetic Operations using eval and exec in Python
Section 18: Programming Essentials using Python – Predefined Functions
Lecture 197 Predefined Functions in Python – Introduction
Lecture 198 Overview of Predefined Functions in Python
Lecture 199 Numeric Functions in Python
Lecture 200 Overview of Strings in Python
Lecture 201 String Manipulation Functions in Python
Lecture 202 Formatting Strings in Python
Lecture 203 Print and Input Functions in Python
Lecture 204 Date Manipulation Functions in Python
Lecture 205 Exercises – Predefined Functions in Python
Section 19: Programming Essentials using Python – User Defined Functions
Lecture 206 Developing User Defined Functions in Python – Introduction
Lecture 207 Defining Functions in Python
Lecture 208 Doc Strings in Python
Lecture 209 Returning Variables from Python Functions
Lecture 210 Passing Function Parameters and Arguments to Python Functions
Lecture 211 Varying Arguments in Python
Lecture 212 Keyword Arguments in Python
Lecture 213 Recap of User Defined Functions in Python
Lecture 214 Passing Functions as Arguments to Python Functions
Lecture 215 Lambda or Anonymous Functions in Python
Lecture 216 Usage of Lambda Functions in Python Functions
Lecture 217 Exercise – User Defined Functions in Python
Section 20: Programming Essentials using Python – Overview of Collections – list and set
Lecture 218 Overview of Collections in Python – list and set – Introduction
Lecture 219 Overview of list and set in Python
Lecture 220 Common Operations on Python Collections
Lecture 221 Accessing elements from Python list
Lecture 222 Adding elements to Python list
Lecture 223 Updating and Deleting elements from Python list
Lecture 224 Other or Miscellaneous Python list operations
Lecture 225 Adding and Deleting elements using Python set
Lecture 226 Typical Python set operations
Lecture 227 Validating Python sets
Lecture 228 Usage of Python list and set
Lecture 229 Exercises – Basic Operations on Python list and set
Lecture 230 Python List of Delimited Strings
Lecture 231 Sorting data in Python lists and tuples
Lecture 232 Sorting list of Delimited Strings using Python
Lecture 233 Exercises – Sorting lists and sets in Python
Section 21: Programming Essentials using Python – Overview of Collections – dict and tuple
Lecture 234 Manipulating Collections using loops in Python – Introduction
Lecture 235 Overview of Python dict and tuple
Lecture 236 Common Operations on dict and tuple using Python
Lecture 237 Accessing Elements from Python tuples
Lecture 238 Accessing Elements from Python dict
Lecture 239 Manipulating Python dict
Lecture 240 Common Examples of Python dict
Lecture 241 Representing Tables or Excel Sheets as Python List of Tuples
Lecture 242 Representing Tables or Excel Sheets as Python List of dicts
Lecture 243 Process Python dict values
Lecture 244 Processing Python dict items
Lecture 245 Sorting Python dict items
Lecture 246 Exercises – Overview of Python Collections – dict and set
Section 22: Programming Essentials using Python – Manipulating Collections using loops
Lecture 247 Manipulating Collections using loops in Python – Introduction
Lecture 248 Reading Files into Python Collections
Lecture 249 Overview of Standard Transformations
Lecture 250 Row Level Transformations using Python loops
Lecture 251 Getting Unique Elements using Python loops
Lecture 252 Filtering Data using Python loops and conditionals
Lecture 253 Preparing Data Sets
Lecture 254 Quick recap of Python dict operations
Lecture 255 Performing Total Aggregations using Python loops
Lecture 256 Overview of Grouped Aggregations using Python loops
Lecture 257 Get Order Count by Status using Python loops
Lecture 258 Get Revenue Details per Order using Python loops
Lecture 259 Get Order Count by Month using Python loops
Lecture 260 Joining Data Sets using Python loops
Lecture 261 Manipulate Collections using Comprehensions in Python
Lecture 262 List Comprehensions using Python
Lecture 263 Set Comprehensions using Python
Lecture 264 Dict Comprehensions in Python
Lecture 265 Limitations of using loops to process data sets
Lecture 266 Exercises – Manipulating Collections using Python loops
Section 23: Programming Essentials using Python – Development of Map Reduce APIs
Lecture 267 Develop myFilter Function using Python loops and conditionals
Lecture 268 Validate myFilter using Python loops and conditionals
Lecture 269 Develop myMap Function using Python loops
Lecture 270 Validate myMap Function using Python loops
Lecture 271 Develop myReduce Function using Python loops
Lecture 272 Validate myReduce Function using Python loops
Lecture 273 Develop myReduceByKey Function using Python loops
Lecture 274 Validate myReduceByKey Function using Python loops
Lecture 275 Develop myJoin Function using Python loops
Lecture 276 Validate myJoin Function using Python loops
Lecture 277 Exercises – Development of Map Reduce APIs using Python loops and Conditionals
Section 24: Programming Essentials using Python – Understanding Map Reduce Libraries
Lecture 278 Preparing Data Sets
Lecture 279 Filtering Data using Python filter
Lecture 280 Projecting data using Python map
Lecture 281 Row Level Transformations using Python map
Lecture 282 Aggregations using Python reduce
Lecture 283 Get Revenue for a given product id using Python Map Reduce
Lecture 284 Get total items sold and revenue for a product using Python Map reduce
Lecture 285 Get total commission amount using Python Map Reduce
Lecture 286 Overview of itertools
Lecture 287 Cumulative Operations using Python itertools
Lecture 288 Using Python itertools starmap
Lecture 289 Overview of Python itertools groupby
Lecture 290 Get order count by status using Python itertools groupby
Lecture 291 Get revenue per order using Python itertools groupby
Lecture 292 Limitations of Python Map Reduce Libraries
Lecture 293 Exercises – Understanding Python Map Reduce Libraries
Section 25: Programming Essentials using Python – Basics of File IO using Python
Lecture 294 Basics of File IO using Python – Introduction
Lecture 295 Overview of File IO using Python
Lecture 296 Understand concepts behind Folders and Files
Lecture 297 Getting File Paths and File Names
Lecture 298 Overview of Retail Data
Lecture 299 Read text file into string using Python File I/O
Lecture 300 Write string to text file using Python File I/O
Lecture 301 Overview of modes to write into files using Python File I/O
Lecture 302 Overview of Delimited Strings
Lecture 303 Read csv into list of strings using Python File I/O
Lecture 304 Writing Strings to file in Append Mode using Python File I/O
Lecture 305 Managing Files and Folders using Python File I/O
Section 26: Programming Essentials using Python – Delimited Files and Collections
Lecture 306 Understanding Delimited Files and Collections
Lecture 307 Overview of Delimited Text Files
Lecture 308 Recap of basic file IO using Python
Lecture 309 Read Delimited files into list of tuples using Python File I/O
Lecture 310 Write Delimited Strings into files using Python File I/O
Lecture 311 Overview of Python CSV Module to process files
Lecture 312 Read Delimited data into list using Python CSV APIs
Lecture 313 Writing iterables to files using Python CSV APIs
Lecture 314 Advantages of using using APIs in Python CSV module
Lecture 315 Apply Schema on lists from files using Python
Section 27: Programming Essentials using Python – Overview of Pandas Libraries
Lecture 316 Overview of Python Pandas Libraries
Lecture 317 Understanding Python Pandas Data Structures
Lecture 318 Overview of Python Series
Lecture 319 Creating Python Data Frames from lists
Lecture 320 Basic Operations on Python Data Frames
Lecture 321 Reading Data from CSV Files to Python Pandas Data Frames
Lecture 322 Projecting and Filtering using Python Pandas Data Frame APIs
Lecture 323 Performing Total Aggregations using Python Pandas Data Frame APIs
Lecture 324 Performing Grouped Aggregations using Python Pandas Data Frame APIs
Lecture 325 Writing Python Pandas Data Frames to Files
Lecture 326 Joining Data in Python Pandas Data Frames using join
Section 28: Programming Essentials using Python – Database Programming – CRUD Operations
Lecture 327 Database Operations using Python – CRUD Operations – Introduction
Lecture 328 Overview of Database Programming using Python
Lecture 329 Recap of RDBMS Concepts
Lecture 330 Setup Database Client Libraries for Python Applications
Lecture 331 Develop Function to get Database Connection using Python
Lecture 332 Create Database Table in Postgres using Python
Lecture 333 Inserting Data into Table in Postgres using Python
Lecture 334 Updating Existing Table Data in Postgres using Python
Lecture 335 Deleting Data From Table in Postgres using Python
Lecture 336 Querying Data From Table in Postgres using Python
Lecture 337 Recap – CRUD Operations using Python
Section 29: Programming Essentials using Python – Database Programming – Batch Operations
Lecture 338 Database Programming using Python – Batch Operations – Introduction
Lecture 339 Recap of Insert using Python
Lecture 340 Preparing Database to perform batch operations using Python
Lecture 341 Reading Data From File using Python File I/O
Lecture 342 Batch Loading of Data into Database Table using Python
Lecture 343 Best Practices for Batch Loading into Database Table using Python
Section 30: Programming Essentials using Python – Processing JSON Data
Lecture 344 Processing JSON Data – Introduction
Lecture 345 Process JSON using Python Pandas
Lecture 346 JSON Data Types
Lecture 347 Create JSON String
Lecture 348 Process JSON String
Lecture 349 Single JSON Document in Files
Lecture 350 Multiple JSON Documents in files
Lecture 351 Process JSON using Pandas
Lecture 352 Different JSON Formats supported by Python Pandas
Lecture 353 Common Use Cases for JSON
Lecture 354 Write to JSON files using Python json module
Lecture 355 Write to JSON files using Python Pandas
Section 31: Programming Essentials using Python – Processing REST Payloads
Lecture 356 Overview of REST APIs
Lecture 357 Using curl command
Lecture 358 Overview of Postman
Lecture 359 Getting Started with Python requests module
Lecture 360 Convert REST Payload to Python Objects
Lecture 361 Process REST Payload using Python Collection Operations
Lecture 362 Process REST Payload using Python Pandas
Section 32: Understanding Python Virtual Environments
Lecture 363 Introduction to Python Virtual Environments
Lecture 364 Validating Python Versions
Lecture 365 Create Python Virtual Environment for Web Application
Lecture 366 Reviewing dependencies installed in Python Virtual Environment
Lecture 367 Installing Dependencies for Web Application using Python pip
Lecture 368 Getting Details about installed packages using Python pip
Lecture 369 Uninstall Packages using Python pip
Lecture 370 Cleanup Python Virtual Environment
Lecture 371 Recreate and Activate Python Virtual Environment for Web Application
Lecture 372 Define requirements file for Python Web Application
Lecture 373 Install Dependencies using requirements file for Python Web Application
Lecture 374 Create Virtual Environment for Data Engineering Application using Python
Lecture 375 Install Dependencies for Data Engineering Application using Python
Lecture 376 Install Dependencies for Data Engineering Application using Python 3.6
Lecture 377 Validate Python and Package Compatibility and Install Python 3.6
Lecture 378 Conclusion about understanding Python Virtual Environments
Section 33: Overview of Pycharm for Python Application Development
Lecture 379 Introduction to Pycharm for Python Application Development
Lecture 380 Installation of Pycharm on Windows for Python Application Development
Lecture 381 Installation of Pycharm on Mac for Python Application Development
Lecture 382 Setup Python Getting Started Project using Pycharm
Lecture 383 Setup Python Getting Started Project using Pycharm on Mac
Lecture 384 Setup de-demo Python project using Pycharm
Lecture 385 Accessing Settings in Pycharm and Changing Font Size
Lecture 386 Accessing Settings in Pycharm and Changing Font Size on Mac
Lecture 387 Install Python Packages using Pycharm
Lecture 388 Overview of Pycharm Integrated Terminal
Lecture 389 Overview of Pycharm Integrated Terminal on Mac
Lecture 390 Overview of Run Time Arguments for Python Applications
Lecture 391 Passing Run Time Arguments to Python Applications using Pycharm
Section 34: Data Copier – Getting Started
Lecture 392 Introduction to Getting Started for Data copier using Python
Lecture 393 Problem Statement – Data Copier using Python
Lecture 394 Create Working Directory for the Python Project
Lecture 395 Setup Docker on Windows 10 Pro
Lecture 396 Quick Overview of Docker
Lecture 397 Prepare Dataset
Lecture 398 Create Postgres Container
Lecture 399 Setup Postgres Database for development
Lecture 400 Overview of Postgres Database Commands
Lecture 401 Setup Python Project using Pycharm
Lecture 402 Managing Python Dependencies for the project
Lecture 403 Create GitHub Project
Section 35: Data Copier – Reading Data using Pandas
Lecture 404 Reading Data using Python Pandas – Introduction
Lecture 405 Overview of Retail Data
Lecture 406 Adding Python Pandas to the project
Lecture 407 Reading JSON Data using Python Pandas
Lecture 408 Previewing Data using Python Pandas
Lecture 409 Reading Data in Chunks using Python Pandas
Lecture 410 Dynamically read files using Python os module
Section 36: Data Copier – Database Programming using Pandas
Lecture 411 Database Programming using Python Pandas – Introduction
Lecture 412 Validate Postgres Setup using Docker
Lecture 413 Add required dependencies for database programming using Python pandas
Lecture 414 Create users table in retail_db Database
Lecture 415 Populating Sample Data into users table
Lecture 416 Reading data from table using Python Pandas
Lecture 417 Truncate users Postgres Database Table
Lecture 418 Writing Python Pandas Dataframe to table
Lecture 419 Validating users data in Postgres Database Table
Lecture 420 Drop users Postgres Database Table
Section 37: Data Copier – Loading Data from files to tables
Lecture 421 Loading Data from files to tables – Introduction
Lecture 422 Populating Departments data into table
Lecture 423 Validate departments table
Lecture 424 Populating orders table in chunks using Python Pandas
Lecture 425 Validate orders table in Postgres Database
Lecture 426 Validate orders table using pandas
Section 38: Data Copier – Modularizing the application
Lecture 427 Overview of Python main function
Lecture 428 Overview of Python Environment Variables
Lecture 429 Using Python os module for Environment Variables
Lecture 430 Passing Environment Variables to Python Applications using Pycharm
Lecture 431 Read logic using Python Pandas
Lecture 432 Validate read logic developed using Python Pandas
Lecture 433 Write logic using Python Pandas
Lecture 434 Validate write logic developed using Python Pandas
Lecture 435 Integrate read and write logic using Python
Lecture 436 Validate Integration logic developed using Python
Lecture 437 Develop logic to load multiple tables using Python
Lecture 438 Validate Python logic for table list as run time argument
Lecture 439 Push Python Application Changes to remote git repository
Section 39: Data Copier – Dockerizing the application
Lecture 440 Dockerizing the application – Introduction
Lecture 441 Prepare Database for validation
Lecture 442 Pull and validate appropriate python image
Lecture 443 Create and attach network to database docker container
Lecture 444 Quick recap about Docker containers
Lecture 445 Review Python based Data Copier Application
Lecture 446 Deploying Python application and installing dependencies in the docker container
Lecture 447 Copy source data files into container
Lecture 448 Add Python Data Copier container to custom network
Lecture 449 Installing OS libraries as part of Docker container
Lecture 450 Validate Network Connectivity between Docker Containers
Lecture 451 Running Application from the Docker Container
Lecture 452 Delete Docker Container
Section 40: Data Copier – Using custom Docker Image
Lecture 453 Using Custom Docker Image – Introduction
Lecture 454 Getting started with docker custom image
Lecture 455 Install OS Modules in custom docker image
Lecture 456 Copying Python Source Code to Docker Custom Image
Lecture 457 Adding dependencies to the custom image
Lecture 458 Understanding docker custom image build process
Lecture 459 Mounting Data Folders on to Docker Container
Lecture 460 Passing Environment Variables to Docker Container
Lecture 461 Add Python Data Copier Container to custom network
Lecture 462 Run Python application using Docker
Section 41: Data Copier – Deploy and Validate Application on Remote Server
Lecture 463 Deploy and Validate Python Application on Remote Server – Introduction
Lecture 464 Push Application Changes to GitHub Repository
Lecture 465 Requirements to deploy application on Virtual Machine
Lecture 466 Clone Application on remote machine
Lecture 467 Setup Data Set for Validation
Lecture 468 Setup Network and Database Folder for Database using Docker
Lecture 469 Setup Docker Container for the Database
Lecture 470 Setup Database and Tables as part of Docker based Database Server
Lecture 471 Building Custom Docker Image for application
Lecture 472 Run and Validate Dockerized Application
Section 42: Validate ITVersity Hadoop and Spark Cluster (for ITVersity lab customers)
Lecture 473 Setup Development Environment using VS Code Remote Development Extension Pack
Lecture 474 Review Data Sets Provided as part of Gateway Nodes of Hadoop and Spark Cluster
Lecture 475 Validate HDFS on Multi Node Hadoop and Spark Cluster from Gateway Node
Lecture 476 Validate Hive on Hadoop and Spark Multinode Cluster
Lecture 477 Review Hadoop HDFS and YARN Property Files on Hadoop and Spark Cluster
Lecture 478 Review Hadoop HDFS and YARN Property Files using Visual Studio Code Editor
Lecture 479 Review Hive Property Files on Multinode Hadoop and Spark Cluster
Lecture 480 Review Spark 2 Property Files and Important Properties
Lecture 481 Validate Spark Shell CLI using Spark 2
Lecture 482 Validate Pyspark CLI using Spark 2
Lecture 483 Validate Spark SQL CLI using Spark 2
Lecture 484 Review Spark 3 Property Files and Important Properties
Lecture 485 Validate Spark Shell CLI using Spark 3
Lecture 486 Validate Pyspark CLI using Spark 3
Lecture 487 Validate Spark SQL CLI using Spark 3
Section 43: Setup Single Node Hadoop and Spark Cluster or Lab using Docker
Lecture 488 Setup Single Node Hadoop and Spark Cluster or Lab using Docker
Lecture 489 Pre-requisites to setup Hadoop and Spark Lab
Lecture 490 Configure Docker Desktop
Lecture 491 Update Hadoop and Spark Content
Lecture 492 Clone GitHub Repository to setup and learn Hadoop and Spark
Lecture 493 Cleaning up Docker Containers used for Python and SQL Practice
Lecture 494 Review Hadoop and Spark Lab details in Docker Compose File
Lecture 495 Pull Docker Image for Single Node Hadoop and Spark
Lecture 496 Start Docker Containers related to Hadoop and Spark
Lecture 497 Overview of reviewing Hadoop and Spark Lab setup using Docker
Lecture 498 Connecting to Terminal of Spark and Hadoop Containers
Lecture 499 Review HDFS and YARN on Single Node Hadoop and Spark Cluster
Lecture 500 Review and Validate HIve on Single Node Hadoop and Spark Cluster
Lecture 501 Validate Spark 2 using Pyspark and Spark SQL on Single Node Lab
Lecture 502 Validate Spark 3 using Pyspark and Spark SQL on Single Node Lab
Lecture 503 Validate HIve Metastore used as part of Single Node Hadoop and Spark Cluster
Lecture 504 Access Hadoop and Spark Material using Jupyter lab environment
Lecture 505 Managing Single Node Hadoop and Spark Cluster using Docker
Section 44: Introduction to Hadoop eco system – Overview of HDFS
Lecture 506 Getting help or usage
Lecture 507 Listing HDFS Files
Lecture 508 Managing HDFS Directories
Lecture 509 Copying files from local to HDFS
Lecture 510 Copying files from HDFS to local
Lecture 511 Getting Files Metadata
Lecture 512 Previewing Data in HDFS Files
Lecture 513 HDFS Block Size
Lecture 514 HDFS Replication Factor
Lecture 515 Getting HDFS Storage Usage
Lecture 516 USing HDFS Stat Commands
Lecture 517 HDFS File Permissions
Lecture 518 Overriding Properties of Hadoop or HDFS commands
Section 45: Data Engineering using Spark SQL – Getting Started
Lecture 519 Getting Started – Overview
Lecture 520 Overview of Spark Documentation
Lecture 521 Launching and using Spark SQL CLI
Lecture 522 Overview of Spark SQL Properties
Lecture 523 Running OS Commands using Spark SQL
Lecture 524 Understanding Warehouse Directory
Lecture 525 Managing Spark Metastore Databases
Lecture 526 Managing Spark Metastore Tables
Lecture 527 Retrieve Metadata of Tables
Lecture 528 Role of Spark Metastore or Hive Metastore
Lecture 529 Exercise – Getting Started with Spark SQL
Section 46: Data Engineering using Spark SQL – Basic Transformations
Lecture 530 Basic Transformations – Introduction
Lecture 531 Spark SQL – Overview
Lecture 532 Define Problem Statement
Lecture 533 Prepare Tables
Lecture 534 Projecting Data
Lecture 535 Filtering Data
Lecture 536 Joining Tables – Inner
Lecture 537 Joining Tables – Outer
Lecture 538 Aggregation Data
Lecture 539 Sorting Data
Lecture 540 Conclusion – Final Solution
Section 47: Data Engineering using Spark SQL – Managing Tables – Basic DDL and DML
Lecture 541 Introduction
Lecture 542 Create Spark Metastore Tables
Lecture 543 Overview of Data Types
Lecture 544 Adding Comments
Lecture 545 Loading Data Into Tables – Local
Lecture 546 Loading Data Into Tables – HDFS
Lecture 547 Loading Data – Append and Overwrite
Lecture 548 Creating External Tables
Lecture 549 Managed Tables vs External Tables
Lecture 550 Overview of File Formats
Lecture 551 Drop Tables and Databases
Lecture 552 Truncating Tables
Lecture 553 Exercise – Managed Tables
Section 48: Data Engineering using Spark SQL – Managing Tables – DML and Partitioning
Lecture 554 Introduction – Managing Tables – DML and Partitioning
Lecture 555 Introduction to Partitioning
Lecture 556 Creating Tables using Parquet
Lecture 557 Load vs Insert
Lecture 558 Inserting Data using Stage Table
Lecture 559 Creating Partitioned Tables
Lecture 560 Adding Partitions to Tables
Lecture 561 Loading Data into Partitioned Tables
Lecture 562 Inserting Data into Partitions
Lecture 563 Using Dynamic Partition Mode
Lecture 564 Exercise – Partitioned Tables
Section 49: Data Engineering using Spark SQL – Overview of Spark SQL Functions
Lecture 565 Introduction – Overview of Spark SQL Functions
Lecture 566 Overview of Functions
Lecture 567 Validating Functions
Lecture 568 String Manipulation Functions
Lecture 569 Date Manipulation Functions
Lecture 570 Overview of Numeric Functions
Lecture 571 Data Type Conversion
Lecture 572 Dealing with Nulls
Lecture 573 Using CASE and WHEN
Lecture 574 Query Example – Word Count
Section 50: Data Engineering using Spark SQL – Windowing Functions
Lecture 575 Introduction – Windowing Functions
Lecture 576 Prepare HR Database
Lecture 577 Overview of Windowing Functions
Lecture 578 Aggregations using Windowing Functions
Lecture 579 Using LEAD or LAG
Lecture 580 Getting first and last values
Lecture 581 Ranking using Windowing Functions
Lecture 582 Order of execution of SQL.cmproj
Lecture 583 Overview of Subqueries
Lecture 584 Filtering Windowing Function Results
Section 51: Apache Spark using Python – Data Processing Overview
Lecture 585 Starting Spark Context – pyspark
Lecture 586 Overview of Spark Read APIs
Lecture 587 Understanding airlines data
Lecture 588 Inferring Schema
Lecture 589 Previewing Airlines Data
Lecture 590 Overview of Data Frame APIs
Lecture 591 Overview of Functions
Lecture 592 Overview of Spark Write APIs
Section 52: Apache Spark using Python – Processing Column Data
Lecture 593 Overview of Predefined Functions in Spark
Lecture 594 Create Dummy Data Frame
Lecture 595 Categories of Functions
Lecture 596 Special Functions – col and lit
Lecture 597 Common String Manipulation Functions
Lecture 598 Extracting Strings using substring
Lecture 599 Extracting Strings using split
Lecture 600 Padding Characters around Strings
Lecture 601 Trimming Characters from Strings
Lecture 602 Date and Time Manipulation Functions
Lecture 603 Date and Time Arithmetic
Lecture 604 Using Date and Time Trunc Functions
Lecture 605 Date and Time Extract Functions
Lecture 606 Using to_date and to_timestamp
Lecture 607 Using date_format Function
Lecture 608 Dealing with Unix Timestamp
Lecture 609 Dealing with Nulls
Lecture 610 Using CASE and WHEN
Section 53: Apache Spark using Python – Basic Transformations
Lecture 611 Overview of Basic Transformations
Lecture 612 Data Frames for basic transformations
Lecture 613 Basic Filtering of Data
Lecture 614 Filtering Example using dates
Lecture 615 Boolean Operators
Lecture 616 Using IN Operator or isin Function
Lecture 617 Using LIKE Operator or like Function
Lecture 618 Using BETWEEN Operator
Lecture 619 Dealing with Nulls while Filtering
Lecture 620 Total Aggregations
Lecture 621 Aggregate data using groupBy
Lecture 622 Aggregate data using rollup
Lecture 623 Aggregate data using cube
Lecture 624 Overview of Sorting Data Frames
Lecture 625 Solution – Problem 1 – Get Total Aggregations
Lecture 626 Solution – Problem 2 – Get Total Aggregations By FlightDate
Section 54: Apache Spark using Python – Joining Data Sets
Lecture 627 Prepare Datasets for Joins
Lecture 628 Analyze Datasets for Joins
Lecture 629 Problem Statements for Joins
Lecture 630 Overview of Joins
Lecture 631 Using Inner Joins
Lecture 632 Left or Right Outer Join
Lecture 633 Solution – Get Flight Count Per US Airport
Lecture 634 Solution – Get Flight Count Per US State
Lecture 635 Solution – Get Dormant US Airports
Lecture 636 Solution – Get Origins without master data
Lecture 637 Solution – Get Count of Flights without master data
Lecture 638 Solution – Get Count of Flights per Airport without master data
Lecture 639 Solution – Get Daily Revenue
Lecture 640 Solution – Get Daily Revenue rolled up till Yearly
Section 55: Apache Spark using Python – Spark Metastore
Lecture 641 Overview of Spark Metastore
Lecture 642 Exploring Spark Catalog
Lecture 643 Creating Metastore Tables using catalog
Lecture 644 Inferring Schema for Tables
Lecture 645 Define Schema for Tables using StructType
Lecture 646 Inserting into Existing Tables
Lecture 647 Read and Process data from Metastore Tables
Lecture 648 Create Partitioned Tables
Lecture 649 Saving as Partitioned Table
Lecture 650 Creating Temporary Views
Lecture 651 Using Spark SQL
Section 56: Getting Started with Semi Structured Data using Spark
Lecture 652 Introduction to Getting Started with Semi Structured Data using Spark
Lecture 653 Create Spark Metastore Table with Special Data Types
Lecture 654 Overview of ARRAY Type in Spark Metastore Table
Lecture 655 Overview of MAP and STRUCT Type in Spark Metastore Table
Lecture 656 Insert Data into Spark Metastore Table with Special Type Columns
Lecture 657 Create Spark Data Frame with Special Data Types
Lecture 658 Create Spark Data Frame with Special Types using Python List
Lecture 659 Insert Spark Data Frame with Special Types into Spark Metastore Table
Lecture 660 Review Data in the JSON File with Special Data Types
Lecture 661 Setup JSON Data Set to explore Spark APIs on Special Data Type Columns
Lecture 662 Read JSON Data with Special Types into Spark Data Frame
Lecture 663 Flatten Array Fields in Spark Data Frames using explode and explode_outer
Lecture 664 Get Size or Length of Array Type Columns in Spark Data Frame
Lecture 665 Concatenate Array Values into Delimited String using Spark APIs
Lecture 666 Convert Delimited Strings from Spark Data Frame Columns to Arrays
Lecture 667 Setup Data Sets to Build Arrays using Spark
Lecture 668 Read JSON Data into Spark Data Frame and Review Aggregate Operations
Lecture 669 Build Arrays from Flattened Rows of Spark Data Frame
Lecture 670 Getting Started with Spark Data Frames with Struct Columns
Lecture 671 Concatenate Struct Column Values in Spark Data Frame
Lecture 672 Filter Data on Struct Column Attributes in Spark Data Frame
Lecture 673 Create Spark Data Frame using Map Type Column
Lecture 674 Project Map Values as Columns using Spark Data Frame APIs
Lecture 675 Conclusion of Getting Started with Semi Structured Data using Spark
Section 57: Process Semi Structured Data using Spark Data Frame APIs
Lecture 676 Introduction to Process Semi Structured Data using Spark Data Frame APIs
Lecture 677 Review the Data Sets to generate denormalized JSON Data using Spark
Lecture 678 Setup JSON Data Sets in HDFS using HDFS Command
Lecture 679 Create Spark Data Frames using Data Frame APIs
Lecture 680 Join Orders and Order Items using Spark Data Frame APIs
Lecture 681 Generate Struct Field for Order Details using Spark
Lecture 682 Generate Array of Struct Field for Order Details using Spark
Lecture 683 Join Data Sets to generate denormalized JSON Data using Spark
Lecture 684 Denormalize Join Results using Spark Data Frame APIs
Lecture 685 Write Denormalized Customer Details to JSON Files using Spark
Lecture 686 Publish JSON Files for downstream applications
Lecture 687 Read Denormalized Data into Spark Data Frame
Lecture 688 Filter Denormalized Data Frame using Spark APIs
Lecture 689 Perform Aggregations on Denormalized Data Frame using Spark
Lecture 690 Flatten Semi Structured Data or Denormalized Data using Spark
Lecture 691 Compute Monthly Customer Revenue using Spark on Denormalized Data
Lecture 692 Conclusion of Processing Semi Structured Data using Spark Data Frame APIs
Section 58: Apache Spark – Development Life Cycle using Python
Lecture 693 Setup Virtual Environment and Install Pyspark
Lecture 694 [Commands] – Setup Virtual Environment and Install Pyspark
Lecture 695 Getting Started with Pycharm
Lecture 696 [Code and Instructions] – Getting Started with Pycharm
Lecture 697 Passing Run Time Arguments
Lecture 698 Accessing OS Environment Variables
Lecture 699 Getting Started with Spark
Lecture 700 Create Function for Spark Session
Lecture 701 [Code and Instructions] – Create Function for Spark Session
Lecture 702 Setup Sample Data
Lecture 703 Read Data from Files
Lecture 704 [Code and Instructions] – Read data from files
Lecture 705 Process Data using Spark APIs
Lecture 706 [Code and Instructions] – Process data using Spark APIs
Lecture 707 Write Data to Files
Lecture 708 [Code and Instructions] – Write data to files
Lecture 709 Validating Writing Data to Files
Lecture 710 Productionizing the Code
Lecture 711 [Code and Instructions] – Productionizing the code
Lecture 712 Setting up Data for Production Validation
Lecture 713 Running Application using YARN
Lecture 714 Detailed Validation of the Application
Section 59: Spark Application Execution Life Cycle and Spark UI
Lecture 715 Deploying and Monitoring Spark Applications – Introduction
Lecture 716 Overview of Types of Spark Cluster Managers
Lecture 717 Setup EMR Cluster with Hadoop and Spark
Lecture 718 Overall Capacity of Big Data Cluster with Hadoop and Spark
Lecture 719 Understanding YARN Capacity of an Enterprise Cluster
Lecture 720 Overview of Hadoop HDFS and YARN Setup on Multi-node Cluster
Lecture 721 Overview of Spark Setup on top of Hadoop
Lecture 722 Setup Data Set for Word Count application
Lecture 723 [Instructions and Commands] Setup Data Set for Word Count Application
Lecture 724 Develop Word Count Application
Lecture 725 [Code] Develop Word Count Application
Lecture 726 Review Deployment Process of Spark Application
Lecture 727 Overview of Spark Submit Command
Lecture 728 Switching between Python Versions to run Spark Apps or launch Pyspark CLI
Lecture 729 Switching between Pyspark Versions to run Spark Apps or launch Pyspark CLI
Lecture 730 Review Spark Configuration Properties at Run Time
Lecture 731 Develop Shell Script to run Spark Application
Lecture 732 [Code] Develop Shell Script to run Spark Application
Lecture 733 Run Spark Application and review default executors
Lecture 734 Overview of Spark History Server UI
Section 60: Setup SSH Proxy to access Spark Application logs
Lecture 735 Setup SSH Proxy to access Spark Application logs – Introduction
Lecture 736 Overview of Private and Public ips of servers in the cluster
Lecture 737 Overview of SSH Proxy
Lecture 738 Setup sshuttle on Mac or Linux
Lecture 739 Proxy using sshuttle on Mac or Linux
Lecture 740 Accessing Spark Application logs via SSH Proxy using sshuttle on Mac or Linux
Lecture 741 Side effects of using SSH Proxy to access Spark Application Logs
Lecture 742 Steps to setup SSH Proxy on Windows to access Spark Application Logs
Lecture 743 Setup PuTTY and PuTTYgen on Windows
Lecture 744 Quick Tour of PuTTY on Windows
Lecture 745 Configure Passwordless Login using PuTTYGen Keys on Windows
Lecture 746 Run Spark Application on Gateway Node using PuTTY
Lecture 747 Configure Tunnel to Gateway Node using PuTTY on Windows for SSH Proxy
Lecture 748 Setup Proxy on Windows and validate using Microsoft Edge browser
Lecture 749 Understanding Proxying Network Traffic overcoming Windows Caveats
Lecture 750 Update Hosts file for worker nodes using private ips
Lecture 751 Access Spark Application logs using SSH Proxy
Lecture 752 Overview of performing tasks related to Spark Applications using Mac
Section 61: Deployment Modes of Spark Applications
Lecture 753 Deployment Modes of Spark Applications – Introduction
Lecture 754 Default Execution Master Type for Spark Applications
Lecture 755 Launch Pyspark using local mode
Lecture 756 Running Spark Applications using Local Mode
Lecture 757 Overview of Spark CLI Commands such as Pyspark
Lecture 758 Accessing Local Files using Spark CLI or Spark Applications
Lecture 759 Overview of submitting spark application using client deployment mode
Lecture 760 Overview of submitting spark application using cluster deployment mode
Lecture 761 Review the default logging while submitting Spark Applications
Lecture 762 Changing Spark Application Log Level using custom log4j properties
Lecture 763 Submit Spark Application using client mode with log level info
Lecture 764 Submit Spark Application using cluster mode with log level info
Lecture 765 Submit Spark Applications using SPARK_CONF_DIR with custom properties files
Lecture 766 Submit Spark Applications using Properties File
Computer Science or IT Students or other graduates with passion to get into IT,Data Warehouse Developers who want to transition to Data Engineering roles,ETL Developers who want to transition to Data Engineering roles,Database or PL/SQL Developers who want to transition to Data Engineering roles,BI Developers who want to transition to Data Engineering roles,QA Engineers to learn about Data Engineering,Application Developers to gain Data Engineering Skills
Course Information:
Udemy | English | 65h 57m | 31.29 GB
Created by: Durga Viswanatha Raju Gadiraju
You Can See More Courses in the IT & Software >> Greetings from CourseDown.com