UM  > 科技學院  > 電腦及資訊科學系
An Application of Oversampling, Undersampling, Bagging and Boosting in Handling Imbalanced Datasets
Bee Wah Yap1; Khatijahhusna Abd Rani1; Hezlin Aryani Abd Rahman1; Simon Fong2; Zuraida Khairudin1; Nik Nik Abdullah3
2014
Conference NameFirst International Conference on Advanced Data and Information Engineering (DaEng-2013)
Source PublicationLecture Notes in Electrical Engineering:Proceedings of the First International Conference on Advanced Data and Information Engineering (DaEng-2013)
Volume285 LNEE
Pages13-22
Conference DateDec 16, 2013 - Dec 18, 2013
Conference PlaceKuala Lumpur, Malaysia
Abstract

Most classifiers work well when the class distribution in the response variable of the dataset is well balanced. Problems arise when the dataset is imbalanced. This paper applied four methods: Oversampling, Undersampling, Bagging and Boosting in handling imbalanced datasets. The cardiac surgery dataset has a binary response variable (1=Died, 0=Alive). The sample size is 4976 cases with 4.2% (Died) and 95.8% (Alive) cases. CART, C5 and CHAID were chosen as the classifiers. In classification problems, the accuracy rate of the predictive model is not an appropriate measure when there is imbalanced problem due to the fact that it will be biased towards the majority class. Thus, the performance of the classifier is measured using sensitivity and precision Oversampling and undersampling are found to work well in improving the classification for the imbalanced dataset using decision tree. Meanwhile, boosting and bagging did not improve the Decision Tree performance. 

KeywordBagging Boosting Imbalanced Data Oversampling Undersampling
DOIhttps://doi.org/10.1007/978-981-4585-18-7_2
URLView the original
Indexed By其他
Language英语
Fulltext Access
Citation statistics
Document TypeConference paper
CollectionDEPARTMENT OF COMPUTER AND INFORMATION SCIENCE
Affiliation1.Faculty of Computer and Mathematical Sciences, Universiti Teknologi MARA, Selangor,Malaysia
2.Faculty of Science and Technology, University of Macau, China
3.Faculty of Medicine, Universiti Teknologi MARA, Selangor, Malaysia
Recommended Citation
GB/T 7714
Bee Wah Yap,Khatijahhusna Abd Rani,Hezlin Aryani Abd Rahman,et al. An Application of Oversampling, Undersampling, Bagging and Boosting in Handling Imbalanced Datasets[C],2014:13-22.
Related Services
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Google Scholar
Similar articles in Google Scholar
[Bee Wah Yap]'s Articles
[Khatijahhusna Abd Rani]'s Articles
[Hezlin Aryani Abd Rahman]'s Articles
Baidu academic
Similar articles in Baidu academic
[Bee Wah Yap]'s Articles
[Khatijahhusna Abd Rani]'s Articles
[Hezlin Aryani Abd Rahman]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Bee Wah Yap]'s Articles
[Khatijahhusna Abd Rani]'s Articles
[Hezlin Aryani Abd Rahman]'s Articles
Terms of Use
No data!
Social Bookmark/Share
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.