Pentaho for ETL Data Integration Masterclass 2023 PDI 9

Use Pentaho Data Integration tool for ETL & Data warehousing. Do ETL development using PDI 9.0 without coding background
Pentaho for ETL Data Integration Masterclass 2023 PDI 9
File Size :
3.46 GB
Total length :
9h 22m



Start-Tech Academy


Last update




Pentaho for ETL Data Integration Masterclass 2023 PDI 9

What you’ll learn

Understanding of the entire data integration process using PDI
Extracting data from all popular data sources including Excel, JSON, Zipped files, TXT files and even cloud storage
Cleaning the data using Pentaho Data Integration
Applying business rules on the data in PDI
Different types of Data transformations
Loading the data into different formats
Managing SQL database using PDI
Metadata Injection – a powerful tool offered by PDI
Understanding of the concepts of data marts and data warehouse

Pentaho for ETL Data Integration Masterclass 2023 PDI 9


Basic understanding of the data storage concepts will be helpful. Coding background is NOT required for this course


What is ETL?The ETL (extract, transform, load) process is the most popular method of collecting data from multiple sources and loading it into a centralized data warehouse. ETL is an essential component of data warehousing and analytics.Why Pentaho for ETL?Pentaho has phenomenal ETL, data analysis, metadata management and reporting capabilities. Pentaho is faster than other ETL tools (including Talend). Pentaho has a user-friendly GUI which is easier and takes less time to learn. Pentaho is great for beginners. Also, Pentaho Data Integration (PDI) is an important skill in data analytics field.How much can I earn?In the US, median salary of an ETL developer is $74,835 and in India average salary is Rs. 7,06,902 per year. Accenture, Tata Consultancy Services, Cognizant Technology Solutions, Capgemini, IBM, Infosys etc. are major recruiters for people skilled in ETL tools; Pentaho ETL is one of the most sought-after skills that recruiters look for. Demand for Pentaho Data Integration (PDI) techniques is increasing day after day.What makes us qualified to teach you?The course is taught by Abhishek and Pukhraj. Instructors of the course have been teaching Data Science and Machine Learning for over a decade. We have experience in teaching and implementing Pentaho ETL, Pentaho Data Integration (PDI) for data mining and data analysis purposes.We are also the creators of some of the most popular online courses – with over 150,000 enrollments and thousands of 5-star reviews like these ones:I had an awesome moment taking this course. It broaden my knowledge more on the power use of Excel as an analytical tools. Kudos to the instructor! – SikiruVery insightful, learning very nifty tricks and enough detail to make it stick in your mind. – ArmandOur PromiseTeaching our students is our job and we are committed to it. If you have any questions about the course content on Pentaho, ETL, practice sheet or anything related to any topic, you can always post a question in the course or send us a direct message.Download Practice files, take Quizzes, and complete AssignmentsWith each lecture, there is a practice sheet attached for you to follow along. You can also take quizzes to check your understanding of concepts on Pentaho, ETL, Pentaho Data Integration, Pentaho ETL. Each section contains a practice assignment for you to practically implement your learning on Pentaho, ETL, Pentaho Data Integration, Pentaho ETL. Solution to Assignment is also shared so that you can review your performance.By the end of this course, your confidence in using Pentaho ETL and Pentaho Data Integration (PDI) will soar. You’ll have a thorough understanding of how to use Pentaho for ETL and Pentaho Data Integration (PDI) techniques for study or as a career opportunity.Go ahead and click the enroll button, and I’ll see you in lesson 1 of this Pentaho ETL course!CheersStart-Tech Academy


Section 1: Introduction

Lecture 1 Welcome to the course

Lecture 2 Course resources

Section 2: Pentaho Data Integration (PDI) Installation and Setup

Lecture 3 Setting up environment and installing PDI

Lecture 4 This is a milestone!

Lecture 5 Opening Spoon – The Graphical UI

Section 3: A Simple ETL Demonstration

Lecture 6 The example problem statement

Lecture 7 Demonstration of a PDI transformation

Lecture 8 Demonstration of a PDI Job

Section 4: Basic concepts – Theory for foundational understanding

Lecture 9 What is ETL?

Lecture 10 Data Warehouse, Ops Database and Data mart

Lecture 11 Inmon vs Kimball Architecture

Lecture 12 ETL vs ELT

Section 5: The ETL process: The practical part begins here

Lecture 13 Data and the ETL process

Section 6: DATA EXTRACTION: Extracting tabular data

Lecture 14 Manually entering data into PDI

Lecture 15 Inputting Data from a TXT (text) file

Lecture 16 Input from multiple CSV files at the same time

Lecture 17 Inputting Data from an Excel file

Lecture 18 Extracting Data from Zipped files

Section 7: DATA EXTRACTION: Extracting non-tabular data

Lecture 19 Extracting from XML

Lecture 20 Extracting from JSON

Section 8: Extracting from an SQL table

Lecture 21 Plan for importing sales data

Lecture 22 Installing PostgreSQL and pgAdmin in your PC

Lecture 23 Creating Sales table in SQL

Lecture 24 Extracting from an SQL table

Section 9: Storing and Retrieving Data from Cloud storage

Lecture 25 Storing Data on AWS S3

Lecture 26 Reading data from AWS S3

Section 10: Merging Data Streams

Lecture 27 Concepts: Merging Data Streams

Lecture 28 Sorted Merge Step – Merging customer data

Lecture 29 Merging product data

Lecture 30 Append data stream – merging sales data

Section 11: Data Cleansing

Lecture 31 Introduction to Data Cleansing

Lecture 32 Value Mapper Step

Lecture 33 Replace in String Step

Lecture 34 Fuzzy Match concepts

Lecture 35 Fuzzy Match Step in PDI

Lecture 36 Fuzzy Match Algorithms

Lecture 37 Formula Step and changing data format

Lecture 38 Common Data Cleaning Steps

Section 12: Data Validation

Lecture 39 Introduction to Data validation

Lecture 40 Data_validation 1 – String-to-Int and integer range validations

Lecture 41 Data validation 2 – Checking Reference Values using stream look-up

Lecture 42 Data validation 3 – Order date < shipping date using calculator step

Lecture 43 Common Data Validation steps

Section 13: Error Handling

Lecture 44 Correcting the errors and merging with main stream

Lecture 45 Writing the errors to the log

Lecture 46 Writing the errors to a separate file

Section 14: Transformation and Analytics steps

Lecture 47 Concatenating Address Fields

Lecture 48 Data Aggregation using Group-by

Lecture 49 Normalization and Denormalization

Lecture 50 Number Range Step

Section 15: PDI SQL Connection

Lecture 51 Introduction to PDI – SQL connection

Lecture 52 Reading and filtering data from DB into PDI

Lecture 53 Updating and Inserting data into DB from PDI

Lecture 54 Deleting data from SQL DB using PDI

Section 16: Conceptual understanding for Loading Data

Lecture 55 Facts and Dimensions tables

Lecture 56 Surrogate Keys in Dimension tables

Lecture 57 Type 1 & 2 Slowly Changing Dimensions

Lecture 58 Schemas

Section 17: Loading the data into a Data Mart

Lecture 59 Creating tables in DB

Lecture 60 Loading Customer Data using combination lookup/ update step

Lecture 61 Loading product data using dimension lookup step

Lecture 62 Loading sales data after database lookup steps

Section 18: Running Java and Javascript

Lecture 63 Scripting Steps

Section 19: PDI Jobs

Lecture 64 PDI Jobs vs Transformation

Lecture 65 Controlling the flow of execution

Lecture 66 Setting variables using set variables step

Lecture 67 File and Folder Management

Lecture 68 Sending Email Step

Lecture 69 Abort Job Step

Section 20: Scheduling a job for production environment

Lecture 70 Running using command prompt and scheduling

Section 21: Metadata injection

Lecture 71 Metadata injection

Section 22: Regex Notation

Lecture 72 Regular Expressions for advanced String Matching

Section 23: Congratulations and about your certificate

Lecture 73 Alternative to Pentaho

Lecture 74 The final milestone!

Lecture 75 Bonus Lecture

Students who want to have a career in the field of Data warehouse/ETL developer,ETL developers and data process automation developers,Business managers who want to understand the entire ETL process and become capable of implementing it

Course Information:

Udemy | English | 9h 22m | 3.46 GB
Created by: Start-Tech Academy

You Can See More Courses in the Business >> Greetings from

New Courses

Scroll to Top