Machine Learning with Imbalanced Data

Learn to over-sample and under-sample your data, apply SMOTE, ensemble methods, and cost-sensitive learning.
Machine Learning with Imbalanced Data
File Size :
4.39 GB
Total length :
11h 24m



Soledad Galli


Last update




Machine Learning with Imbalanced Data

What you’ll learn

Apply random under-sampling to remove observations from majority classes
Perform under-sampling by removing observations that are hard to classify
Carry out under-sampling by retaining observations at the boundary of class separation
Apply random over-sampling to augment the minority class
Create syntethic data to increase the examples of the minority class
Implement SMOTE and its variants to synthetically generate data
Use ensemble methods with sampling techniques to improve model performance
Change the miss-classification cost optimized by the models to accomodate minority classes
Determine model performance with the most suitable metrics for imbalanced datasets

Machine Learning with Imbalanced Data


Knowledge of machine learning basic algorithms, i.e., regression, decision trees and nearest neighbours
Python programming, including familiarity with NumPy, Pandas and Scikit-learn
A Python and Jupyter notebook installation


Welcome to Machine Learning with Imbalanced Datasets. In this course, you will learn multiple techniques which you can use with imbalanced datasets to improve the performance of your machine learning models.If you are working with imbalanced datasets right now and want to improve the performance of your models, or you simply want to learn more about how to tackle data imbalance, this course will show you how.We’ll take you step-by-step through engaging video tutorials and teach you everything you need to know about working with imbalanced datasets. Throughout this comprehensive course, we cover almost every available methodology to work with imbalanced datasets, discussing their logic, their implementation in Python, their advantages and shortcomings, and the considerations to have when using the technique. Specifically, you will learn:Under-sampling methods at random or focused on highlighting certain sample populationsOver-sampling methods at random and those which create new examples based of existing observationsEnsemble methods that leverage the power of multiple weak learners in conjunction with sampling techniques to boost model performanceCost sensitive methods which penalize wrong decisions more severely for minority classesThe appropriate metrics to evaluate model performance on imbalanced datasetsBy the end of the course, you will be able to decide which technique is suitable for your dataset, and / or apply and compare the improvement in performance returned by the different methods on multiple datasets.This comprehensive machine learning course includes over 50 lectures spanning more than 10 hours of video, and ALL topics include hands-on Python code examples which you can use for reference and for practice, and re-use in your own projects.In addition, the code is updated regularly to keep up with new trends and new Python library releases.So what are you waiting for? Enroll today, learn how to work with imbalanced datasets and build better machine learning models.


Section 1: Introduction

Lecture 1 Course Curriculum Overview

Lecture 2 Course Material

Lecture 3 Code | Jupyter notebooks

Lecture 4 Presentations covered in the course

Lecture 5 Python package Imbalanced-learn

Lecture 6 Download Datasets

Lecture 7 Additional resources for Machine Learning and Python programming

Section 2: Machine Learning with Imbalanced Data: Overview

Lecture 8 Imbalanced classes – Introduction

Lecture 9 Nature of the imbalanced class

Lecture 10 Approaches to work with imbalanced datasets – Overview

Lecture 11 Additional Reading Resources (Optional)

Section 3: Evaluation Metrics

Lecture 12 Introduction to Performance Metrics

Lecture 13 Accuracy

Lecture 14 Accuracy – Demo

Lecture 15 Precision, Recall and F-measure

Lecture 16 Install Yellowbrick

Lecture 17 Precision, Recall and F-measure – Demo

Lecture 18 Confusion tables, FPR and FNR

Lecture 19 Confusion tables, FPR and FNR – Demo

Lecture 20 Balanced Accuracy

Lecture 21 Balanced accuracy – Demo

Lecture 22 Geometric Mean, Dominance, Index of Imbalanced Accuracy

Lecture 23 Geometric Mean, Dominance, Index of Imbalanced Accuracy – Demo

Lecture 24 ROC-AUC

Lecture 25 ROC-AUC – Demo

Lecture 26 Precision-Recall Curve

Lecture 27 Precision-Recall Curve – Demo

Lecture 28 Comparison of ROC and PR curves – Optional

Lecture 29 Additional reading resources (Optional)

Lecture 30 Probability

Lecture 31 Metrics for Mutliclass

Lecture 32 Metrics for Multiclass – Demo

Lecture 33 PR and ROC Curves for Multiclass

Lecture 34 PR Curves in Multiclass – Demo

Lecture 35 ROC Curve in Multiclass – Demo

Section 4: Udersampling

Lecture 36 Under-Sampling Methods – Introduction

Lecture 37 Random Under-Sampling – Intro

Lecture 38 Random Under-Sampling – Demo

Lecture 39 Condensed Nearest Neighbours – Intro

Lecture 40 Condensed Nearest Neighbours – Demo

Lecture 41 Tomek Links – Intro

Lecture 42 Tomek Links – Demo

Lecture 43 One Sided Selection – Intro

Lecture 44 One Sided Selection – Demo

Lecture 45 Edited Nearest Neighbours – Intro

Lecture 46 Edited Nearest Neighbours – Demo

Lecture 47 Repeated Edited Nearest Neighbours – Intro

Lecture 48 Repeated Edited Nearest Neighbours – Demo

Lecture 49 All KNN – Intro

Lecture 50 All KNN – Demo

Lecture 51 Neighbourhood Cleaning Rule – Intro

Lecture 52 Neighbourhood Cleaning Rule – Demo

Lecture 53 NearMiss – Intro

Lecture 54 NearMiss – Demo

Lecture 55 Instance Hardness – Intro

Lecture 56 Instance Hardness Threshold – Demo

Lecture 57 Instance Hardness Threshold Multiclass Demo

Lecture 58 Undersampling Method Comparison

Lecture 59 Wrapping up the section

Lecture 60 Setting up a classifier with under-sampling and cross-validation

Lecture 61 Summary Table

Section 5: Oversampling

Lecture 62 Over-Sampling Methods – Introduction

Lecture 63 Random Over-Sampling

Lecture 64 Random Over-Sampling – Demo

Lecture 65 ROS with smoothing – Intro

Lecture 66 ROS with smoothing – Demo

Lecture 67 SMOTE

Lecture 68 SMOTE – Demo

Lecture 69 SMOTE-NC

Lecture 70 SMOTE-NC – Demo

Lecture 71 SMOTE-N

Lecture 72 SMOTE-N Demo

Lecture 73 ADASYN

Lecture 74 ADASYN – Demo

Lecture 75 Borderline SMOTE

Lecture 76 Borderline SMOTE – Demo

Lecture 77 SVM SMOTE

Lecture 78 Resources on SVMs

Lecture 79 SVM SMOTE – Demo

Lecture 80 K-Means SMOTE

Lecture 81 K-Means SMOTE – Demo

Lecture 82 Over-Sampling Method Comparison

Lecture 83 Wrapping up the section

Lecture 84 How to Correctly Set Up a Classifier with Over-sampling

Lecture 85 Setting Up a Classifier – Demo

Lecture 86 Summary Table

Section 6: Over and Undersampling

Lecture 87 Combining Over and Under-sampling – Intro

Lecture 88 Combining Over and Under-sampling – Demo

Lecture 89 Comparison of Over and Under-sampling Methods

Lecture 90 Combine over and under-sampling manually

Lecture 91 Wrapping up

Section 7: Ensemble Methods

Lecture 92 Ensemble methods with Imbalanced Data

Lecture 93 Foundations of Ensemble Learning

Lecture 94 Bagging

Lecture 95 Bagging plus Over- or Under-Sampling

Lecture 96 Boosting

Lecture 97 Boosting plus Re-Sampling

Lecture 98 Hybdrid Methods

Lecture 99 Ensemble Methods – Demo

Lecture 100 Wrapping up

Lecture 101 Additional Reading Resources

Section 8: Cost Sensitive Learning

Lecture 102 Cost-sensitive Learning – Intro

Lecture 103 Types of Cost

Lecture 104 Obtaining the Cost

Lecture 105 Cost Sensitive Approaches

Lecture 106 Misclassification Cost in Logistic Regression

Lecture 107 Misclassification Cost in Decision Trees

Lecture 108 Cost Sensitive Learning with Scikit-learn

Lecture 109 Find Optimal Cost with hyperparameter tuning

Lecture 110 Bayes Conditional Risk

Lecture 111 MetaCost

Lecture 112 MetaCost – Demo

Lecture 113 Optional: MetaCost Base Code

Lecture 114 Additional Reading Resources

Section 9: Probability Calibration

Lecture 115 Probability Calibration

Lecture 116 Probability Calibration Curves

Lecture 117 Probability Calibration Curves – Demo

Lecture 118 Brier Score

Lecture 119 Brier Score – Demo

Lecture 120 Under- and Over-sampling and Cost-sensitive learning on Probability Calibration

Lecture 121 Calibrating a Classifier

Lecture 122 Calibrating a Classifier – Demo

Lecture 123 Calibrating a Classfiier after SMOTE or Under-sampling

Lecture 124 Calibrating a Classifier with Cost-sensitive Learning

Lecture 125 Probability: Additional reading resources

Section 10: Putting it all together

Lecture 126 Examples

Section 11: Next steps

Lecture 127 Vote for the next course!

Lecture 128 Congratulations

Lecture 129 Bonus Lecture

Data scientists and machine learning engineers working with imbalanced datasets,Data scientists who want to improve the performance of models trained on imbalanced datasets,Students who want to learn intermediate content on machine learning,Students working with imbalanced multi-class targets

Course Information:

Udemy | English | 11h 24m | 4.39 GB
Created by: Soledad Galli

You Can See More Courses in the Developer >> Greetings from

New Courses

Scroll to Top