Publication View

A-Optimality for Active Learning of Logistic Regression Classifiers (2004)

Abstract
Over the last decade there has been growing interest in pool-based active learning techniques, where instead of receiving an i.i.d. sample from a pool of unlabeled data, a learner may take an active role in selecting examples from the pool. Queries to an oracle (a human annotator in most applications) provide label information for the selected observations, but at a cost. The challenge is to end up with a model that provides the best possible generalization error at the least cost. Popular methods such as uncertainty sampling often work well, but sometimes fail badly. We take the A-optimality criterion used in optimal experimental design, and extend it so that it can be used for pool-based active learning of logistic regression classifiers. A-optimality has attractive theoretical properties, and empirical evaluation confirms that it offers a more robust approach to active learning for logistic regression than alternatives.

Publication details
Download http://repository.upenn.edu/cis_reports/2
Publisher ScholarlyCommons@Penn
Repository ScholarlyCommons@Penn (United States)
Keywords computer science, active learning, logistic regression
Type text

Cited publications (4)
Bayesian Experimental Design: A Review (1996)
Toward Optimal Active Learning through Sampling Estimation of Error Reduction (2001)
Less is More: Active Learning with Support Vector Machines (2000)
Active Learning for Structure in Bayesian Networks (2001)