Bine ați venit la Scribd!

Handling Class Imbalance in Machine Learning

Încărcat de

0% au considerat acest document util (0 voturi)

40 vizualizări2 pagini

In many machine learning classification problems, class imbalance is a major issue that results in algorithm favoring minority class. This document describes some good techniques to avoid this issue.

Titlu original

Handling Class Imbalance in machine learning

Drepturi de autor

Formate disponibile

DOCX, PDF, TXT sau citiți online pe Scribd

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Raportați acest document

In many machine learning classification problems, class imbalance is a major issue that results in algorithm favoring minority class. This document describes some good techniques to avoid this issue.

Drepturi de autor:

Formate disponibile

Descărcați ca DOCX, PDF, TXT sau citiți online pe Scribd

Indicator pentru conținut neadecvat

0% au considerat acest document util (0 voturi)

40 vizualizări2 pagini

Handling Class Imbalance in Machine Learning

Încărcat de

yogibh

In many machine learning classification problems, class imbalance is a major issue that results in algorithm favoring minority class. This document describes some good techniques to avoid this issue.

Drepturi de autor:

Formate disponibile

Descărcați ca DOCX, PDF, TXT sau citiți online pe Scribd

Indicator pentru conținut neadecvat

Salt la pagina

Sunteți pe pagina 1din 2

Căutați în document

Machine Learning Techniques :

Handling class imbalance:

The given training set comprised of 700 class A samples and 2800 class B samples. With 1:4 class
imbalance, many of the standard algorithms tend to classify some of class B points as class A. To
account for class imbalance, various approaches were implemented both in dataset level and
algorithmic level. Oversampling and undersampling are the standard approaches in dataset level
whereas modifications to various algorithms exist to account for class imbalance in algorithmic level.
Dataset level approaches:
Oversampling the minority class: Minority class data points were oversampled by using SMOTE
(Synthetic Minority Oversampling Technique) algorithm. In SMOTE algorithm, oversampling is
implemented by choosing each minority class sample and introducing synthetic examples along the
line joining any/ all of the k minority class neighbours. A modification to this approach was
implemented whereby instead of points along line segments, centroids of 3-closest minority class
points were introduced. This method showed success, but didnt improve performance by a
significant amount.
Undersampling the majority class: Another standard technique to account for class imbalance is
undersampling the majority class. Different approaches were tried as listed below:
1. Bagging with Split datasets: Four different training sets were made and the 700 minority
class samples were included in each of the sets. Majority class was split into four 700 sample
sets and each one was included in one set. This resulted in 4 different datasets with 1400
samples each. Training was done on these four sets individually and the results from these
four classifiers were bagged together to produce the final output. The main problem with
this approach was false positive rate( considering minority class as positive class) was very
high.
2. Removing noisy samples using linear regression : Linear regression was training and the
distance of data points from decision boundary were stored . All majority class points close
to the boundary and on the wrong side of the boundary were removed. This method helped
increase performance.
3. Removing noisy samples using K-nn : In this approach, majority class samples close to a
significant fraction of minority class samples were removed. For each majority class data
point, K- nearest data points were chosen and if the number of minority class points among
these K points were more than M, the majority class point was removed. Parameters K and
M were estimated using cross-validation. In our setting, K value of 20 and M value of 7
performed the best. This method showed good success in improving the performance.
Among these undersampling techniques, removing points using K-nn performed the best.
Algorithmic level approaches:
Balance weights feature in various scikit classifiers were selected. This would help in balancing
weights before the trees were learnt.

Final approach used :

Combinations of undersampling and oversampling approaches were used in our final model.
Number of minority class samples was increased to 1000 using the modified SMOTE algorithm. With
noise removal using K-nn, 500 noisy majority class points were decimated. The resulting dataset
showed good increase in performance in many of the standard classifiers.

S-ar putea să vă placă și

Hypothesis Testing Lecture
Document28 pagini
Hypothesis Testing Lecture
yogibh
Încă nu există evaluări
Puzzles
Document17 pagini
Puzzles
yogibh
Încă nu există evaluări
MScThesis DesiPramudiwati StudentNo4039424
Document84 pagini
MScThesis DesiPramudiwati StudentNo4039424
yogibh
Încă nu există evaluări
Problemsheet 3
Document2 pagini
Problemsheet 3
yogibh
Încă nu există evaluări
Aieee 2008 Solution
Document33 pagini
Aieee 2008 Solution
api-19826463
Încă nu există evaluări
Kvpy Challan Receit
Document1 pagină
Kvpy Challan Receit
Abhishek Saha
100% (1)
IITJEE 2012 Solutions Paper-2 Chemisrty English
Document10 pagini
IITJEE 2012 Solutions Paper-2 Chemisrty English
Resonance Kota
Încă nu există evaluări
Truth
Document1 pagină
Truth
yogibh
Încă nu există evaluări
Higher Algebra - Hall & Knight
Document593 pagini
Higher Algebra - Hall & Knight
Ram Gollamudi
100% (2)
Try Hard and All The Very Best For Iit Jee
Document1 pagină
Try Hard and All The Very Best For Iit Jee
yogibh
Încă nu există evaluări
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
De la Everand
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
Mark Manson
Evaluare: 4 din 5 stele
4/5 (5794)
The Yellow House: A Memoir (2019 National Book Award Winner)
De la Everand
The Yellow House: A Memoir (2019 National Book Award Winner)
Sarah M. Broom
Evaluare: 4 din 5 stele
4/5 (98)
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
De la Everand
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
Dave Eggers
Evaluare: 3.5 din 5 stele
3.5/5 (231)
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
De la Everand
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
Margot Lee Shetterly
Evaluare: 4 din 5 stele
4/5 (895)
The Little Book of Hygge: Danish Secrets to Happy Living
De la Everand
The Little Book of Hygge: Danish Secrets to Happy Living
Meik Wiking
Evaluare: 3.5 din 5 stele
3.5/5 (400)
Shoe Dog: A Memoir by the Creator of Nike
De la Everand
Shoe Dog: A Memoir by the Creator of Nike
Phil Knight
Evaluare: 4.5 din 5 stele
4.5/5 (537)
Never Split the Difference: Negotiating As If Your Life Depended On It
De la Everand
Never Split the Difference: Negotiating As If Your Life Depended On It
Chris Voss
Evaluare: 4.5 din 5 stele
4.5/5 (838)
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
De la Everand
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
Ashlee Vance
Evaluare: 4.5 din 5 stele
4.5/5 (474)
Grit: The Power of Passion and Perseverance
De la Everand
Grit: The Power of Passion and Perseverance
Angela Duckworth
Evaluare: 4 din 5 stele
4/5 (588)
Yes Please
De la Everand
Yes Please
Amy Poehler
Evaluare: 4 din 5 stele
4/5 (1891)
The Emperor of All Maladies: A Biography of Cancer
De la Everand
The Emperor of All Maladies: A Biography of Cancer
Siddhartha Mukherjee
Evaluare: 4.5 din 5 stele
4.5/5 (271)
On Fire: The (Burning) Case for a Green New Deal
De la Everand
On Fire: The (Burning) Case for a Green New Deal
Naomi Klein
Evaluare: 4 din 5 stele
4/5 (74)
Team of Rivals: The Political Genius of Abraham Lincoln
De la Everand
Team of Rivals: The Political Genius of Abraham Lincoln
Doris Kearns Goodwin
Evaluare: 4.5 din 5 stele
4.5/5 (234)
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
De la Everand
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
Gilbert King
Evaluare: 4.5 din 5 stele
4.5/5 (266)
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
De la Everand
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
Ben Horowitz
Evaluare: 4.5 din 5 stele
4.5/5 (344)
Rise of ISIS: A Threat We Can't Ignore
De la Everand
Rise of ISIS: A Threat We Can't Ignore
Jay Sekulow
Evaluare: 3.5 din 5 stele
3.5/5 (137)
The World Is Flat 3.0: A Brief History of the Twenty-first Century
De la Everand
The World Is Flat 3.0: A Brief History of the Twenty-first Century
Thomas L. Friedman
Evaluare: 3.5 din 5 stele
3.5/5 (2259)
Fear: Trump in the White House
De la Everand
Fear: Trump in the White House
Bob Woodward
Evaluare: 3.5 din 5 stele
3.5/5 (738)
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
De la Everand
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
Brené Brown
Evaluare: 4 din 5 stele
4/5 (1090)
Principles: Life and Work
De la Everand
Principles: Life and Work
Ray Dalio
Evaluare: 4 din 5 stele
4/5 (599)
John Adams
De la Everand
John Adams
David McCullough
Evaluare: 4.5 din 5 stele
4.5/5 (2409)
The Unwinding: An Inner History of the New America
De la Everand
The Unwinding: An Inner History of the New America
George Packer
Evaluare: 4 din 5 stele
4/5 (45)
The Glass Castle: A Memoir
De la Everand
The Glass Castle: A Memoir
Jeannette Walls
Evaluare: 4.5 din 5 stele
4.5/5 (1712)
Angela's Ashes: A Memoir
De la Everand
Angela's Ashes: A Memoir
Frank McCourt
Evaluare: 4.5 din 5 stele
4.5/5 (440)
Steve Jobs
De la Everand
Steve Jobs
Walter Isaacson
Evaluare: 4.5 din 5 stele
4.5/5 (806)
Bad Feminist: Essays
De la Everand
Bad Feminist: Essays
Roxane Gay
Evaluare: 4 din 5 stele
4/5 (1015)
The Outsider: A Novel
De la Everand
The Outsider: A Novel
Stephen King
Evaluare: 4 din 5 stele
4/5 (1839)
The Light Between Oceans: A Novel
De la Everand
The Light Between Oceans: A Novel
M.L. Stedman
Evaluare: 4.5 din 5 stele
4.5/5 (789)
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
De la Everand
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
Viet Thanh Nguyen
Evaluare: 4.5 din 5 stele
4.5/5 (121)
Brooklyn: A Novel
De la Everand
Brooklyn: A Novel
Colm Tóibín
Evaluare: 3.5 din 5 stele
3.5/5 (1937)
The Woman in Cabin 10
De la Everand
The Woman in Cabin 10
Ruth Ware
Evaluare: 3.5 din 5 stele
3.5/5 (2322)
A Man Called Ove: A Novel
De la Everand
A Man Called Ove: A Novel
Fredrik Backman
Evaluare: 4.5 din 5 stele
4.5/5 (4609)
The Perks of Being a Wallflower
De la Everand
The Perks of Being a Wallflower
Stephen Chbosky
Evaluare: 4.5 din 5 stele
4.5/5 (2104)
Wolf Hall: A Novel
De la Everand
Wolf Hall: A Novel
Hilary Mantel
Evaluare: 4 din 5 stele
4/5 (3811)
Little Women
De la Everand
Little Women
Louisa May Alcott
Evaluare: 4 din 5 stele
4/5 (104)
Manhattan Beach: A Novel
De la Everand
Manhattan Beach: A Novel
Jennifer Egan
Evaluare: 3.5 din 5 stele
3.5/5 (792)
The Art of Racing in the Rain: A Novel
De la Everand
The Art of Racing in the Rain: A Novel
Garth Stein
Evaluare: 4 din 5 stele
4/5 (4200)
The Constant Gardener: A Novel
De la Everand
The Constant Gardener: A Novel
John le Carré
Evaluare: 3.5 din 5 stele
3.5/5 (104)
A Tree Grows in Brooklyn
De la Everand
A Tree Grows in Brooklyn
Betty Smith
Evaluare: 4.5 din 5 stele
4.5/5 (1929)
Her Body and Other Parties: Stories
De la Everand
Her Body and Other Parties: Stories
Carmen Maria Machado
Evaluare: 4 din 5 stele
4/5 (821)
Sing, Unburied, Sing: A Novel
De la Everand
Sing, Unburied, Sing: A Novel
Jesmyn Ward
Evaluare: 4 din 5 stele
4/5 (1103)
Chapter 2 P2 Section A
Document27 pagini
Chapter 2 P2 Section A
Nurul 'Ain
Încă nu există evaluări
Internetworking: 1 Coms22101 Lecture 9
Document12 pagini
Internetworking: 1 Coms22101 Lecture 9
Pradeep Raja
Încă nu există evaluări
RulesonEarthquake - Accelerograph
Document17 pagini
RulesonEarthquake - Accelerograph
mjfprgc
Încă nu există evaluări
Understanding & Programming The PIC16C84: A Beginners' Tutorial Jim Brown
Document35 pagini
Understanding & Programming The PIC16C84: A Beginners' Tutorial Jim Brown
Cornelius Campbell
Încă nu există evaluări
F325 Redox Equations and Titrations
Document9 pagini
F325 Redox Equations and Titrations
Doc_Croc
100% (1)
Getting Started With Experion Software Guide
Document28 pagini
Getting Started With Experion Software Guide
beerman81
Încă nu există evaluări
Pilot Optd. Pr. Relief Valve Modular Construction: MPPR 06
Document3 pagini
Pilot Optd. Pr. Relief Valve Modular Construction: MPPR 06
Dillibabu R
Încă nu există evaluări
LogExportTool Use Guide For BeneVision
Document23 pagini
LogExportTool Use Guide For BeneVision
Vivek Singh Chauhan
Încă nu există evaluări
In The Next Three Chapters, We Will Examine Different Aspects of Capital Market Theory, Including
Document62 pagini
In The Next Three Chapters, We Will Examine Different Aspects of Capital Market Theory, Including
Rahmat M Jayaatmadja
Încă nu există evaluări
ESP32&ESP8266 RF Performance Test Demonstration en
Document42 pagini
ESP32&ESP8266 RF Performance Test Demonstration en
manoranjan
Încă nu există evaluări
Kohler - Part Leveling Machines
Document12 pagini
Kohler - Part Leveling Machines
Ali BÜLBÜL
Încă nu există evaluări
Early Thermal Cracking
Document2 pagini
Early Thermal Cracking
sudhir12345
Încă nu există evaluări
SCC-C Manual
Document28 pagini
SCC-C Manual
Gian Paul Ramos Acosta
Încă nu există evaluări
Unit 4 Rate of Reaction Answers
Document38 pagini
Unit 4 Rate of Reaction Answers
areyouthere92
Încă nu există evaluări
DBM PC Specification
Document11 pagini
DBM PC Specification
kesharinaresh
Încă nu există evaluări
IP 7-1-1 Fired Heaters
Document9 pagini
IP 7-1-1 Fired Heaters
Arnold Jose Batista Rodriguez
Încă nu există evaluări
Che 243 Fluid Dynamics: Problem Set #4 Solutions: Solution
Document9 pagini
Che 243 Fluid Dynamics: Problem Set #4 Solutions: Solution
Kyungtae Park
100% (2)
15CV553 PDF
Document2 pagini
15CV553 PDF
Deepak Oraon
Încă nu există evaluări
ADO and SQL SERVER
Document493 pagini
ADO and SQL SERVER
Kristian Cevallos
Încă nu există evaluări
Scania SOPS Parameters
Document32 pagini
Scania SOPS Parameters
jose breno vieira silva
89% (19)
Mathematical Model of Transportation Problem
Document14 pagini
Mathematical Model of Transportation Problem
wasimghgh
Încă nu există evaluări
Lipid Test
Document4 pagini
Lipid Test
Hak Kub
Încă nu există evaluări
Air-Pollution-Meteorology UNIT II
Document91 pagini
Air-Pollution-Meteorology UNIT II
DR. Ramesh Chandragiri
Încă nu există evaluări
Robotics
Document41 pagini
Robotics
Mark Jason
100% (3)
Rough Surfaces in ANSYS
Document17 pagini
Rough Surfaces in ANSYS
Igor Blanari
Încă nu există evaluări
Canal Regulator
Document13 pagini
Canal Regulator
Bibhuti Bhusan Sahoo
100% (1)
Theodore Sider - Logic For Philosophy PDF
Document377 pagini
Theodore Sider - Logic For Philosophy PDF
Anonymous pYK2AqH
100% (1)
Excel Sales Report Template
Document3 pagini
Excel Sales Report Template
Mark11311
100% (1)
Thesis Document
Document21 pagini
Thesis Document
Chris Lorenz
83% (18)
Castable Refractory
$Castable Refractory$
Document4 pagini
Castable Refractory
SarbajitManna
Încă nu există evaluări