Exploiting Thread-Level Parallelism to Build Decision Trees (2008)
Karsten Steinhaeuser, Nitesh V. Chawla, Peter M. Kogge
Abstract. Classification is an important data mining task, and decision trees have emerged as a popular classifier due to their simplicity and relatively low computational complexity. However, as...
Community Detection in a Large Real-World Social Network (2008)
Karsten Steinhaeuser, Nitesh V. Chawla
Abstract Identifying meaningful community structure in social networks is a hard problem, and extreme network size or sparseness of the network compound the difficulty of the task. With a...
Detecting Fractures in Classifier Performance (2008)
David A. Cieslak, Nitesh V. Chawla
A fundamental tenet assumed by many classification algorithms is the presumption that both training and testing samples are drawn from the same distribution of data – this is the stationary...
ABSTRACT Towards Learning-based Sensor Management (2008)
Karsten Steinhaeuser, Nitesh V. Chawla, Christian Poellabauer
The management of wireless sensor networks in the presence of multiple constraints is an open problem in systems research. Existing methods perform well when optimized for a single parameter (such as...
Short Paper: Troubleshooting Distributed Systems via Data Mining (2008)
David A. Cieslak, Douglas Thain, Nitesh V. Chawla
Through massive parallelism, distributed systems enable the multiplication of productivity. Unfortunately, increasing the scale of available machines to users will also multiply debugging when...
Exploiting Diversity in Ensembles: Improving the Performance on Unbalanced Datasets (2008)
Nitesh V. Chawla, Jared Sylvester
Abstract. Ensembles are often capable of greater predictive performance than any of their individual classifiers. Despite the need for classifiers to make different kinds of errors, the majority...
Activity Mining in Open Source Software (2008)
Daniel Mack, Nitesh V. Chawla, Greg Madey, Prof Nitesh Chawla
Open Source software repository is a collection of various projects with varying levels of activities, participations, and downloads. In this paper, we attempt to categorize and mine activity, thus...
ABSTRACT Towards Learning-based Sensor Management (2008)
Karsten Steinhaeuser, Nitesh V. Chawla, Christian Poellabauer
The management of wireless sensor networks in the presence of multiple constraints is an open problem in systems research. Existing methods perform well when optimized for a single parameter (such as...
Learning Ensembles from bites Learning ensembles from bites: A scalable and accurate (2007)
Nitesh V. Chawla, Kevin W. Bowyer, W. Philip Kegelmeyer, Pack Kaelbling
Bagging and boosting are two popular ensemble methods that typically achieve better accuracy than a single classifier. These techniques have limitations on massive datasets, as the size of the...
Gregory R. Madey, Albert-lászló Barabási, Nitesh V. Chawla, Marta Gonzalez, David Hachen, Brett Lantz, ...
Abstract. We describe a prototype emergency and disaster information system designed and implemented using DDDAS concepts. The system is designed to use real-time cell phone calling data from a...
Anomaly Detection in a Mobile Communication Network (2006)
Alec Pawling, Nitesh V. Chawla, Greg Madey, Alec Pawling, Alec Pawling, Nitesh V. Chawla, ...
Cell phone networks produce a massive volume of service usage data which, when combined with location data, can be used to pinpoint emergency situations that cause changes in network usage. Such a...
Anomaly Detection in a Mobile Communication Network (2006)
Alec Pawling, Nitesh V. Chawla, Greg Madey
Mobile communication networks produce massive amounts of data which may be useful in identifying the location of an emergency situation and the area it affects. We propose a one pass clustering...
A.: Evaluation of summarization schemes for learning in streams (2006)
Alec Pawling, Nitesh V. Chawla, Amitabh Chaudhary
Abstract. Traditional discretization techniques for machine learning, from examples with continuous feature spaces, are not efficient when the data is in the form of a stream from an unknown,...
A Study in Machine Learning from Imbalanced Data for Sentence Boundary Detection in Speech (2006)
Yang Liu, Nitesh V. Chawla, Mary P. Harper, Elizabeth Shriberg, Andreas Stolcke
Enriching speech recognition output with sentence boundaries improves its human readability and enables further processing by downstream language processing modules. We have constructed a hidden...
Wrapper-based computation and evaluation of sampling methods for imbalanced datasets (2005)
Learning from imbalanced datasets presents an interesting problem both from modeling and economy standpoints. When the imbalance is large, classification accuracy on the smaller class(es) tends to be...
Learning from labeled and unlabeled data: An empirical study across techniques and domains (2005)
Nitesh V. Chawla, Grigoris Karakoulas
©2005 AI Access Foundation. All rights reserved. There has been increased interest in devising learning techniques that combine unlabeled data with labeled data – i.e. semi-supervised learning....
Wrapper-based computation and evaluation of sampling methods for imbalanced datasets (2005)
Learning from imbalanced datasets presents an interesting problem both from modeling and economy standpoints. When the imbalance is large, classification accuracy on the smaller class(es) tends to be...
Editorial: Special Issue on Learning from Imbalanced Data Sets (2004)
Nitesh V. Chawla, Nathalie Japkowicz
The class imbalance problem is one of the (relatively) new problems that emerged when machine learning matured from an embryonic science to an applied technology, amply used in the worlds of...
Imbalanced data sets are becoming ubiquitous, as many applications have very few instances of the “interesting ” or “abnormal” class. Traditional machine learning algorithms can be biased...
SMOTEBoost: improving prediction of the minority class in boosting (2003)
Nitesh V. Chawla, Ar Lazarevic, Lawrence O. Hall, Kevin W. Bowyer
Abstract. Many real world data mining applications involve learning from imbalanced data sets, where the particular events of interest may be very few when compared to the other classes. Learning...
SMOTEBoost: improving prediction of the minority class in boosting (2003)
Nitesh V. Chawla, Ar Lazarevic, Lawrence O. Hall, Kevin Bowyer
Abstract. Many real world data mining applications involve learning from imbalanced data sets. Learning from data sets that contain very few instances of the minority (or interesting) class usually...
SMOTE: Synthetic Minority Over-sampling Technique (2002)
Nitesh V. Chawla, Kevin W. Bowyer, W. Philip Kegelmeyer
An approach to the construction of classifiers from imbalanced datasets is described. A dataset is imbalanced if the classification categories are not approximately equally represented. Often...
Generalization Methods in Bioinformatics (2002)
Steven Eschrich, Nitesh V. Chawla, Lawrence O. Hall
Protein secondary structure prediction and high-throughput drug screen data mining are two important applications in bioinformatics. The data is represented in sparse feature spaces and can be...
SMOTE: Synthetic Minority Over-sampling Technique (2002)
Nitesh V. Chawla, Kevin W. Bowyer, W. Philip Kegelmeyer
An approach to the construction of classifiers from imbalanced datasets is described. A dataset is imbalanced if the classification categories are not approximately equally represented. Often...
SMOTE: Synthetic Minority Over-sampling Technique (2002)
Nitesh V. Chawla, Kevin W. Bowyer, W. Philip Kegelmeyer
An approach to the construction of classifiers from imbalanced datasets is described. A dataset is imbalanced if the classification categories are not approximately equally represented. Often...
Ride : rule-learning in a distributed environment / (2000)
Thesis (M.S.C.S.)--University of South Florida, 2000.
The power of learning, in (1996)
Vince Thomas, Nitesh V. Chawla, Kevin W. Bowyer, Patrick J. Flynn
to predict gender from iris images
Modeling the Product Space as a Network
In the market basket setting, we are given a series of transactions each composed of one or more items and the goal is to find relationships between items, usually sets of items that tend to occur in...