Machine Learning with Imbalanced Data
What you’ll learn
Apply random under-sampling to remove observations from majority classes
Perform under-sampling by removing observations that are hard to classify
Carry out under-sampling by retaining observations at the boundary of class separation
Apply random over-sampling to augment the minority class
Create syntethic data to increase the examples of the minority class
Implement SMOTE and its variants to synthetically generate data
Use ensemble methods with sampling techniques to improve model performance
Change the miss-classification cost optimized by the models to accomodate minority classes
Determine model performance with the most suitable metrics for imbalanced datasets
Requirements
Knowledge of machine learning basic algorithms, i.e., regression, decision trees and nearest neighbours
Python programming, including familiarity with NumPy, Pandas and Scikit-learn
A Python and Jupyter notebook installation
Description
Welcome to Machine Learning with Imbalanced Datasets. In this course, you will learn multiple techniques which you can use with imbalanced datasets to improve the performance of your machine learning models.If you are working with imbalanced datasets right now and want to improve the performance of your models, or you simply want to learn more about how to tackle data imbalance, this course will show you how.We’ll take you step-by-step through engaging video tutorials and teach you everything you need to know about working with imbalanced datasets. Throughout this comprehensive course, we cover almost every available methodology to work with imbalanced datasets, discussing their logic, their implementation in Python, their advantages and shortcomings, and the considerations to have when using the technique. Specifically, you will learn:Under-sampling methods at random or focused on highlighting certain sample populationsOver-sampling methods at random and those which create new examples based of existing observationsEnsemble methods that leverage the power of multiple weak learners in conjunction with sampling techniques to boost model performanceCost sensitive methods which penalize wrong decisions more severely for minority classesThe appropriate metrics to evaluate model performance on imbalanced datasetsBy the end of the course, you will be able to decide which technique is suitable for your dataset, and / or apply and compare the improvement in performance returned by the different methods on multiple datasets.This comprehensive machine learning course includes over 50 lectures spanning more than 10 hours of video, and ALL topics include hands-on Python code examples which you can use for reference and for practice, and re-use in your own projects.In addition, the code is updated regularly to keep up with new trends and new Python library releases.So what are you waiting for? Enroll today, learn how to work with imbalanced datasets and build better machine learning models.
Overview
Section 1: Introduction
Lecture 1 Course Curriculum Overview
Lecture 2 Course Material
Lecture 3 Code | Jupyter notebooks
Lecture 4 Presentations covered in the course
Lecture 5 Python package Imbalanced-learn
Lecture 6 Download Datasets
Lecture 7 Additional resources for Machine Learning and Python programming
Section 2: Machine Learning with Imbalanced Data: Overview
Lecture 8 Imbalanced classes – Introduction
Lecture 9 Nature of the imbalanced class
Lecture 10 Approaches to work with imbalanced datasets – Overview
Lecture 11 Additional Reading Resources (Optional)
Section 3: Evaluation Metrics
Lecture 12 Introduction to Performance Metrics
Lecture 13 Accuracy
Lecture 14 Accuracy – Demo
Lecture 15 Precision, Recall and F-measure
Lecture 16 Install Yellowbrick
Lecture 17 Precision, Recall and F-measure – Demo
Lecture 18 Confusion tables, FPR and FNR
Lecture 19 Confusion tables, FPR and FNR – Demo
Lecture 20 Balanced Accuracy
Lecture 21 Balanced accuracy – Demo
Lecture 22 Geometric Mean, Dominance, Index of Imbalanced Accuracy
Lecture 23 Geometric Mean, Dominance, Index of Imbalanced Accuracy – Demo
Lecture 24 ROC-AUC
Lecture 25 ROC-AUC – Demo
Lecture 26 Precision-Recall Curve
Lecture 27 Precision-Recall Curve – Demo
Lecture 28 Comparison of ROC and PR curves – Optional
Lecture 29 Additional reading resources (Optional)
Lecture 30 Probability
Lecture 31 Metrics for Mutliclass
Lecture 32 Metrics for Multiclass – Demo
Lecture 33 PR and ROC Curves for Multiclass
Lecture 34 PR Curves in Multiclass – Demo
Lecture 35 ROC Curve in Multiclass – Demo
Section 4: Udersampling
Lecture 36 Under-Sampling Methods – Introduction
Lecture 37 Random Under-Sampling – Intro
Lecture 38 Random Under-Sampling – Demo
Lecture 39 Condensed Nearest Neighbours – Intro
Lecture 40 Condensed Nearest Neighbours – Demo
Lecture 41 Tomek Links – Intro
Lecture 42 Tomek Links – Demo
Lecture 43 One Sided Selection – Intro
Lecture 44 One Sided Selection – Demo
Lecture 45 Edited Nearest Neighbours – Intro
Lecture 46 Edited Nearest Neighbours – Demo
Lecture 47 Repeated Edited Nearest Neighbours – Intro
Lecture 48 Repeated Edited Nearest Neighbours – Demo
Lecture 49 All KNN – Intro
Lecture 50 All KNN – Demo
Lecture 51 Neighbourhood Cleaning Rule – Intro
Lecture 52 Neighbourhood Cleaning Rule – Demo
Lecture 53 NearMiss – Intro
Lecture 54 NearMiss – Demo
Lecture 55 Instance Hardness – Intro
Lecture 56 Instance Hardness Threshold – Demo
Lecture 57 Instance Hardness Threshold Multiclass Demo
Lecture 58 Undersampling Method Comparison
Lecture 59 Wrapping up the section
Lecture 60 Setting up a classifier with under-sampling and cross-validation
Lecture 61 Summary Table
Section 5: Oversampling
Lecture 62 Over-Sampling Methods – Introduction
Lecture 63 Random Over-Sampling
Lecture 64 Random Over-Sampling – Demo
Lecture 65 ROS with smoothing – Intro
Lecture 66 ROS with smoothing – Demo
Lecture 67 SMOTE
Lecture 68 SMOTE – Demo
Lecture 69 SMOTE-NC
Lecture 70 SMOTE-NC – Demo
Lecture 71 SMOTE-N
Lecture 72 SMOTE-N Demo
Lecture 73 ADASYN
Lecture 74 ADASYN – Demo
Lecture 75 Borderline SMOTE
Lecture 76 Borderline SMOTE – Demo
Lecture 77 SVM SMOTE
Lecture 78 Resources on SVMs
Lecture 79 SVM SMOTE – Demo
Lecture 80 K-Means SMOTE
Lecture 81 K-Means SMOTE – Demo
Lecture 82 Over-Sampling Method Comparison
Lecture 83 Wrapping up the section
Lecture 84 How to Correctly Set Up a Classifier with Over-sampling
Lecture 85 Setting Up a Classifier – Demo
Lecture 86 Summary Table
Section 6: Over and Undersampling
Lecture 87 Combining Over and Under-sampling – Intro
Lecture 88 Combining Over and Under-sampling – Demo
Lecture 89 Comparison of Over and Under-sampling Methods
Lecture 90 Combine over and under-sampling manually
Lecture 91 Wrapping up
Section 7: Ensemble Methods
Lecture 92 Ensemble methods with Imbalanced Data
Lecture 93 Foundations of Ensemble Learning
Lecture 94 Bagging
Lecture 95 Bagging plus Over- or Under-Sampling
Lecture 96 Boosting
Lecture 97 Boosting plus Re-Sampling
Lecture 98 Hybdrid Methods
Lecture 99 Ensemble Methods – Demo
Lecture 100 Wrapping up
Lecture 101 Additional Reading Resources
Section 8: Cost Sensitive Learning
Lecture 102 Cost-sensitive Learning – Intro
Lecture 103 Types of Cost
Lecture 104 Obtaining the Cost
Lecture 105 Cost Sensitive Approaches
Lecture 106 Misclassification Cost in Logistic Regression
Lecture 107 Misclassification Cost in Decision Trees
Lecture 108 Cost Sensitive Learning with Scikit-learn
Lecture 109 Find Optimal Cost with hyperparameter tuning
Lecture 110 Bayes Conditional Risk
Lecture 111 MetaCost
Lecture 112 MetaCost – Demo
Lecture 113 Optional: MetaCost Base Code
Lecture 114 Additional Reading Resources
Section 9: Probability Calibration
Lecture 115 Probability Calibration
Lecture 116 Probability Calibration Curves
Lecture 117 Probability Calibration Curves – Demo
Lecture 118 Brier Score
Lecture 119 Brier Score – Demo
Lecture 120 Under- and Over-sampling and Cost-sensitive learning on Probability Calibration
Lecture 121 Calibrating a Classifier
Lecture 122 Calibrating a Classifier – Demo
Lecture 123 Calibrating a Classfiier after SMOTE or Under-sampling
Lecture 124 Calibrating a Classifier with Cost-sensitive Learning
Lecture 125 Probability: Additional reading resources
Section 10: Putting it all together
Lecture 126 Examples
Section 11: Next steps
Lecture 127 Vote for the next course!
Lecture 128 Congratulations
Lecture 129 Bonus Lecture
Data scientists and machine learning engineers working with imbalanced datasets,Data scientists who want to improve the performance of models trained on imbalanced datasets,Students who want to learn intermediate content on machine learning,Students working with imbalanced multi-class targets
Course Information:
Udemy | English | 11h 24m | 4.39 GB
Created by: Soledad Galli
You Can See More Courses in the Developer >> Greetings from CourseDown.com