Eibe Frank

Combining Naive Bayes and Decision Tables (2009)

Mark Hall, Eibe Frank

We investigate a simple semi-naive Bayesian ranking method that combines naive Bayes with induction of decision tables. Naive Bayes and decision tables can both be trained efficiently, and the same...

One-class Classification by Combining Density and Class Probability Estimation (2009)

Kathryn Hempstalk, Eibe Frank, Ian H. Witten

Abstract. One-class classification has important applications such as outlier and novelty detection. It is commonly tackled using density estimation techniques or by adapting a standard...

Clustering Documents with Active Learning using Wikipedia (2009)

Anna Huang, David Milne, Eibe Frank, Ian H. Witten

Wikipedia has been applied as a background knowledge base to various text mining problems, but very few attempts have been made to utilize it for document clustering. In this paper we propose to...

Analysing chromatographic data using data mining to monitor petroleum content in water (2009)

Holmes, Geoffrey, Fletcher, Dale, Reutemann, Peter, Frank, Eibe

Chromatography is an important analytical technique that has widespread use in environmental applications. A typical application is the monitoring of water samples to determine if they contain...

Large-scale attribute selection using wrappers (2009)

Gutlein, Martin, Frank, Eibe, Hall, Mark A., Karwath, Andreas

Scheme-specific attribute selection with the wrapper and variants of forward selection is a popular attribute selection technique for classification that yields good results. However, it can run the...

Classifier chains for multi-label classification (2009)

Read, Jesse, Pfahringer, Bernhard, Holmes, Geoffrey, Frank, Eibe

The widely known binary relevance method for multi-label classification, which considers each label as an independent binary problem, has been sidelined in the literature due to the perceived...

Improving on Bagging with Input Smearing (2008)

Eibe Frank, Bernhard Pfahringer

Abstract. Bagging is an ensemble learning method that has proved to be a useful tool in the arsenal of machine learning practitioners. Commonly applied in conjunction with decision tree learners to...

Element Knowledge (2008)

Eibe Frank, Morgan Kaufmman, Vladimir Estivill-castro, Unsupervised Learning

A first introduction to the world of machine learning.-how how it relates to KDDM References: ©Vladimir Estivill-Castro

Unsupervised Discretization using Tree-based Density Estimation (2008)

Gabi Schmidberger, Eibe Frank

Abstract. This paper presents an unsupervised discretization method that performs density estimation for univariate data. The subintervals that the discretization produces can be used as the bins of...

Chapter 1 WEKA A Machine Learning Workbench for Data Mining (2008)

Eibe Frank, Mark Hall, Geoffrey Holmes, Richard Kirkby, Ian H. Witten

Abstract The Weka workbench is an organized collection of state-of-the-art machine learning algorithms and data preprocessing tools. The basic way of interacting with these methods is by invoking...

Technical Note Using Model Trees for Classi cation (2008)

Eibe Frank, Yong Wang, Stuart Inglis, Geoffrey Holmes Geo, Ian H. Witten, H. Witten

Abstract. Model trees, which are a type of decision tree with linear regression functions at the leaves, form the basis of a recent successful technique for predicting continuous numeric values. They...

Ensembles of Balanced Nested Dichotomies for Multi-Class Problems (2008)

Lin Dong, Eibe Frank, Stefan Kramer

Abstract. A system of nested dichotomies is a hierarchical decomposition of a multi-class problem with c classes into c − 1 two-class problems and can be represented as a tree structure. Ensembles...

Chapter 1 WEKA A Machine Learning Workbench for Data Mining (2008)

Eibe Frank, Mark Hall, Geoffrey Holmes, Richard Kirkby, Ian H. Witten

Abstract The Weka workbench is an organized collection of state-of-the-art machine learning algorithms and data preprocessing tools. The basic way of interacting with these methods is by invoking...

Logistic Model Trees (2008)

Niels L, Mark Hall, Eibe Frank

Abstract. Tree induction methods and linear models are popular techniques for supervised learning tasks, both for the prediction of nominal classes and continuous numeric values. For predicting...

Chapter 1 WEKA A Machine Learning Workbench for Data Mining (2008)

Eibe Frank, Mark Hall, Geoffrey Holmes, Richard Kirkby, Ian H. Witten

Abstract The Weka workbench is an organized collection of state-of-the-art machine learning algorithms and data preprocessing tools. The basic way of interacting with these methods is by invoking...

Selecting multiway splits in decision trees (2008)

Eibe Frank, Ian H. Witten

Abstract: Decision trees in which numeric attributes are split several ways are more comprehensible than the usual binary trees because attributes rarely appear more than once in any path from root...

One-Class Classification by Combining Density and Class Probability Estimation (2008)

Hempstalk, Kathryn, Frank, Eibe, Witten, Ian H.

One-class classification has important applications such as outlier and novelty detection. It is commonly tackled using density estimation techniques or by adapting a standard classification...

Discriminating Against New Classes: One-class versus Multi-class Classification (2008)

Hempstalk, Kathryn, Frank, Eibe

Many applications require the ability to identify data that is anomalous with respect to a target group of observations, in the sense of belonging to a new, previously unseen ‘attacker’ class....

Revisiting multiple-instance learning via embedded instance selection (2008)

Foulds, James R., Frank, Eibe

Multiple-Instance Learning via Embedded Instance Selection (MILES) is a recently proposed multiple-instance (MI) classification algorithm that applies a single-instance base learner to a...

Additive Regression Applied to a Large-Scale Collaborative Filtering Problem (2008)

Frank, Eibe, Hall, Mark A.

The much-publicized Netflix competition has put the spotlight on the application domain of collaborative filtering and has sparked interest in machine learning algorithms that can be applied to this...

Combining Naive Bayes and Decision Tables (2008)

Hall, Mark A., Frank, Eibe

We investigate a simple semi-naive Bayesian ranking method that combine naive Bayes with induction of decision tables. Naive Bayes and decision tables can both be trained efficientyly, and the same...

A study of hierarchical and flat classification of proteins (2008)

Zimek, Arthur, Buchwald, Fabian, Frank, Eibe, Kramer, Stefan

Automatic classification of proteins using machine learning is an important problem that has received significant attention in the literature. One feature of this problem is that expert-defined...

Interactive Machine Learning|Letting Users Build Classiers (2007)

Malcolm Ware, Eibe Frank, Ian H. Witten

According to standard procedure, building a classi er is a fully automated process that follows data preparation by a domain expert. In contrast, interactive machine learning engages users in...

1 (2007)

Bottom-up Propositionalization, Stefan Kramer, Eibe Frank

Abstract. In this paper, we present a new method for propositionalization that works in a bottom-up, data-driven manner. It is tailored for biochemical databases, where the examples are 2-D...

Abstract Gutwin et al. 1 Improving Browsing in Digital Libraries with Keyphrase Indexes (2007)

Carl Gutwin, Gordon Paynter, Ian Witten, Craig Nevill-manning, Eibe Frank

Browsing accounts for much of people’s interaction with digital libraries, but it is poorly supported by standard search engines. Conventional systems often operate at the wrong level, indexing...

Predicting Library of Congress Classications From Library of Congress Subject Headings (2007)

Eibe Frank, Gordon W. Paynter

This paper addresses the problem of automatically assigning a Library of Congress Classication (LCC) to a work given its set of Library of Congress Subject Headings (LCSH). LCC are organized in a...

Abstract Gutwin et al. 1 Improving Browsing in Digital Libraries with Keyphrase Indexes (2007)

Carl Gutwin, Gordon Paynter, Ian Witten, Craig Nevill-manning, Eibe Frank

Browsing accounts for much of people’s interaction with digital libraries, but it is poorly supported by standard search engines. Conventional systems often operate at the wrong level, indexing...

Selecting multiway splits in decision trees (2007)

Eibe Frank, Ian H. Witten

Abstract: Decision trees in which numeric attributes are split several ways are more comprehensible than the usual binary trees because attributes rarely appear more than once in any path from root...

A Simple Approach to Ordinal Classication (2007)

Eibe Frank, Mark Hall

Abstract. Machine learning methods for classication problems commonly assume that the class values are unordered. However, in many practical applications the class values do exhibit a natural...

Determining Progression in Glaucoma Using Visual Fields (2007)

Andrew Turpin, Eibe Frank, Ian H. Witten, Chris A

Abstract. The standardized visual eld assessment, which measures visual function in 76 locations of the central visual area, is an important diagnostic tool in the treatment of the eye disease...

2 (2007)

Niels Landwehr, Eibe Frank

Abstract. Tree induction methods and linear models are popular techniques for supervised learning tasks, both for the prediction of nominal classes and continuous numeric values. For predicting...

An Empirical Comparison of Exact Nearest Neighbour Algorithms (2007)

Frank, Eibe, Kibriya, Ashraf M.

Bagging is an ensemble learning method that has proved to be a useful tool in the arsenal of machine learning practitioners. Commonly applied in conjunction with decision tree learners to build an...

Improving on bagging with input smearing (2006)

Frank, Eibe, Pfahringer, Bernhard

Bagging is an ensemble learning method that has proved to be a useful tool in the arsenal of machine learning practitioners. Commonly applied in conjunction with decision tree learners to build an...

Naïve Bayes for text classification with unbalanced classes (2006)

Frank, Eibe, Bouckaert, Remco R.

Multinomial naive Bayes (MNB) is a popular method for document classification due to its computational efficiency and relatively good predictive performance. It has recently been established that...

Improving on bagging with input smearing (2006)

Frank, Eibe, Pfahringer, Bernhard

Bagging is an ensemble learning method that has proved to be a useful tool in the arsenal of machine learning practitioners. Commonly applied in conjunction with decision tree learners to build an...

Naïve Bayes for text classification with unbalanced classes (2006)

Frank, Eibe, Bouckaert, Remco R.

Multinomial naive Bayes (MNB) is a popular method for document classification due to its computational efficiency and relatively good predictive performance. It has recently been established that...

Naive bayes for text classification with unbalanced classes (2006)

Eibe Frank, Remco R. Bouckaert

Abstract. Multinomial naive Bayes (MNB) is a popular method for document classification due to its computational efficiency and relatively good predictive performance. It has recently been...

Unsupervised discretization using tree-based density estimation (2005)

Schmidberger, Gabi, Frank, Eibe

This paper presents an unsupervised discretization method that performs density estimation for univariate data. The subintervals that the discretization produces can be used as the bins of a...

Logistic model trees (2005)

Landwehr, Niels, Hall, Mark A., Frank, Eibe

Tree induction methods and linear models are popular techniques for supervised learning tasks, both for the prediction of nominal classes and numeric values. For predicting numeric quantities, there...

Speeding up logistic model tree induction (2005)

Sumner, Marc, Frank, Eibe, Hall, Mark A.

Logistic Model Trees have been shown to be very accurate and compact classifiers [8]. Their greatest disadvantage is the computational complexity of inducing the logistic regression models in the...

A Toolbox for Learning from Relational Data with Propositional and Multi-instance Learners (2005)

Reutemann, Peter, Pfahringer, Bernhard, Frank, Eibe

Most databases employ the relational model for data storage. To use this data in a propositional learner, a propositionalization step has to take place. Similarly, the data has to be transformed to...

Multinomial naive Bayes for text categorization revisited (2005)

Kibriya, Ashraf M., Frank, Eibe, Pfahringer, Bernhard, Holmes, Geoffrey

This paper presents empirical results for several versions of the multinomial naive Bayes classifier on four text categorization problems, and a way of improving it using locally weighted learning....

Using classification to evaluate the output of confidence-based association rule mining (2005)

Mutter, Stefan, Hall, Mark A., Frank, Eibe

Association rule mining is a data mining technique that reveals interesting relationships in a database. Existing approaches employ different parameters to search for interesting rules. This fact and...

Weka: A machine learning workbench for data mining (2005)

Frank, Eibe, Hall, Mark A., Holmes, Geoffrey, Kirkby, Richard, Pfahringer, Bernhard, Witten, Ian H.

The Weka workbench is an organized collection of state-of-the-art machine learning algorithms and data preprocessing tools. The basic way of interacting with these methods is by invoking them from...

Unsupervised discretization using tree-based density estimation (2005)

Schmidberger, Gabi, Frank, Eibe

This paper presents an unsupervised discretization method that performs density estimation for univariate data. The subintervals that the discretization produces can be used as the bins of a...

Logistic model trees (2005)

Landwehr, Niels, Hall, Mark A., Frank, Eibe

Tree induction methods and linear models are popular techniques for supervised learning tasks, both for the prediction of nominal classes and numeric values. For predicting numeric quantities, there...

Speeding up logistic model tree induction (2005)

Sumner, Marc, Frank, Eibe, Hall, Mark A.

Logistic Model Trees have been shown to be very accurate and compact classifiers [8]. Their greatest disadvantage is the computational complexity of inducing the logistic regression models in the...

A Toolbox for Learning from Relational Data with Propositional and Multi-instance Learners (2005)

Reutemann, Peter, Pfahringer, Bernhard, Frank, Eibe

Most databases employ the relational model for data storage. To use this data in a propositional learner, a propositionalization step has to take place. Similarly, the data has to be transformed to...

Multinomial naive Bayes for text categorization revisited (2005)

Kibriya, Ashraf M., Frank, Eibe, Pfahringer, Bernhard, Holmes, Geoffrey

This paper presents empirical results for several versions of the multinomial naive Bayes classifier on four text categorization problems, and a way of improving it using locally weighted learning....

Using classification to evaluate the output of confidence-based association rule mining (2005)

Mutter, Stefan, Hall, Mark A., Frank, Eibe

Association rule mining is a data mining technique that reveals interesting relationships in a database. Existing approaches employ different parameters to search for interesting rules. This fact and...

Weka: A machine learning workbench for data mining (2005)

Frank, Eibe, Hall, Mark A., Holmes, Geoffrey, Kirkby, Richard, Pfahringer, Bernhard, Witten, Ian H.

The Weka workbench is an organized collection of state-of-the-art machine learning algorithms and data preprocessing tools. The basic way of interacting with these methods is by invoking them from...

Ensembles of nested dichotomies for multi-class problems (2004)

Frank, Eibe, Kramer, Stefan

Nested dichotomies are a standard statistical technique for tackling certain polytomous classification problems with logistic regression. They can be represented as binary trees that recursively...

Ensembles of nested dichotomies for multi-class problems (2004)

Frank, Eibe, Kramer, Stefan

Nested dichotomies are a standard statistical technique for tackling certain polytomous classification problems with logistic regression. They can be represented as binary trees that recursively...

Data mining in bioinformatics using Weka (2004)

Frank, Eibe, Hall, Mark A., Trigg, Len, Holmes, Geoffrey, Witten, Ian H.

The Weka machine learning workbench provides a general purpose environment for automatic classification, regression, clustering and feature selection-common data mining problems in bioinformatics...

Logistic regression and boosting for labeled bags of instances (2004)

Xu, Xin, Frank, Eibe

In this paper we upgrade linear logistic regression and boosting to multi-instance data, where each example consists of a labeled bag of instances. This is done by connecting predictions for...

Evaluating the replicability of significance tests for comparing learning algorithms (2004)

Bouckaert, Remco R., Frank, Eibe

Empirical research in learning algorithms for classification tasks generally requires the use of significance tests. The quality of a test is typically judged on Type I error (how often the test...

Data mining in bioinformatics using Weka (2004)

Frank, Eibe, Hall, Mark A., Trigg, Leonard E., Holmes, Geoffrey, Witten, Ian H.

The Weka machine learning workbench provides a general purpose environment for automatic classification, regression, clustering and feature selection-common data mining problems in bioinformatics...

Logistic regression and boosting for labeled bags of instances (2004)

Xu, Xin, Frank, Eibe

In this paper we upgrade linear logistic regression and boosting to multi-instance data, where each example consists of a labeled bag of instances. This is done by connecting predictions for...

Evaluating the replicability of significance tests for comparing learning algorithms (2004)

Bouckaert, Remco R., Frank, Eibe

Empirical research in learning algorithms for classification tasks generally requires the use of significance tests. The quality of a test is typically judged on Type I error (how often the test...

Logistic regression and boosting for labeled bags of instances (2004)

Xin Xu, Eibe Frank

Abstract. In this paper we upgrade linear logistic regression and boosting to multi-instance data, where each example consists of a labeled bag of instances. This is done by connecting predictions...

Logistic regression and boosting for labeled bags of instances (2004)

Xin Xu, Eibe Frank

Abstract. In this paper we upgrade linear logistic regression and boosting to multi-instance data, where each example consists of a labeled bag of instances. This is done by connecting predictions...

Evaluating the replicability of significance tests for comparing learning algorithms (2004)

Remco R. Bouckaert, Eibe Frank

Abstract. Empirical research in learning algorithms for classification tasks generally requires the use of significance tests. The quality of a test is typically judged on Type I error (how often the...

Ensembles of nested dichotomies for multi-class problems (2004)

Eibe Frank, Stefan Kramer

Nested dichotomies are a standard statistical technique for tackling certain polytomous classi cation problems with logistic regression. They can be represented as binary trees that recursively split...

Logistic regression and boosting for labeled bags of instances (2004)

Xin Xu, Eibe Frank

Abstract. In this paper we upgrade linear logistic regression and boosting to multi-instance data, where each example consists of a labeled bag of instances. This is done by connecting predictions...

Ensembles of Nested Dichotomies for Multi-Class Problems (2004)

Eibe Frank, Stefan Kramer

Nested dichotomies are a standard statistical technique for tackling certain polytomous classi cation problems with logistic regression. They can be represented as binary trees that recursively split...

Evaluating the Replicability of Significance Tests for Comparing Learning Algorithms (2004)

Remco R. Bouckaert, Eibe Frank

Empirical research in learning algorithms for classification tasks generally requires the use of significance tests. The quality of a test is typically judged on Type I error (how often the test...

Predicting Library of Congress Classifications from Library of Congress Subject Headings (2004)

Eibe Frank, Gordon W. Paynter

This paper addresses the problem of automatically assigning a Library of Congress Classification (LCC) to a work given its set of Library of Congress Subject Headings (LCSH). LCC are organized in a...

Evaluating the Replicability of Significance Tests for Comparing Learning Algorithms (2004)

Remco R. Bouckaert, Eibe Frank

Empirical research in learning algorithms for classification tasks generally requires the use of significance tests. The quality of a test is typically judged on Type I error (how often the test...

Data mining in bioinformatics using Weka (2004)

Frank, Eibe, Hall, Mark, Trigg, Len, Holmes, Geoffrey, Witten, Ian H.

Summary: The Weka machine learning workbench provides a general-purpose environment for automatic classification, regression, clustering, and feature selection--common data mining problems in...

Data mining in bioinformatics using Weka (2004)

Frank, Eibe, Hall, Mark, Trigg, Len, Holmes, Geoffrey, Witten, Ian H.

Summary: The Weka machine learning workbench provides a general-purpose environment for automatic classification, regression, clustering, and feature selection--common data mining problems in...

Applying propositional learning algorithms to multi-instance data (2003)

Frank, Eibe, Xu, Xin

Multi-instance learning is commonly tackled using special-purpose algorithms. Development of these algorithms has started because early experiments with standard propositional learners have failed to...

Applying propositional learning algorithms to multi-instance data (2003)

Frank, Eibe, Xu, Xin

Multi-instance learning is commonly tackled using special-purpose algorithms. Development of these algorithms has started because early experiments with standard propositional learners have failed to...

Locally weighted naive Bayes (2003)

Frank, Eibe, Hall, Mark A., Pfahringer, Bernhard

Despite its simplicity, the naive Bayes classifier has surprised machine learning researchers by exhibiting good performance on a variety of learning problems. Encouraged by these results,...

Locally weighted naive Bayes (2003)

Frank, Eibe, Hall, Mark A., Pfahringer, Bernhard

Despite its simplicity, the naive Bayes classifier has surprised machine learning researchers by exhibiting good performance on a variety of learning problems. Encouraged by these results,...

Visualizing class probability estimators (2003)

Frank, Eibe, Hall, Mark A.

Inducing classifiers that make accurate predictions on future data is a driving force for research in inductive learning. However, also of importance to the users is how to gain information from the...

Visualizing class probability estimators (2003)

Frank, Eibe, Hall, Mark A.

Inducing classifiers that make accurate predictions on future data is a driving force for research in inductive learning. However, also of importance to the users is how to gain information from the...

Predicting Library of Congress Classifications from Library of Congress Subject Headings (2003)

Frank, Eibe, Paynter, Gordon W.

This paper addresses the problem of automatically assigning a Library of Congress Classification (LCC) to work given its set of Library of Congress Subject Headings (LCSH). LCC are organized in a...

Predicting Library of Congress Classifications from Library of Congress Subject Headings (2003)

Frank, Eibe, Paynter, Gordon W.

This paper addresses the problem of automatically assigning a Library of Congress Classification (LCC) to work given its set of Library of Congress Subject Headings (LCSH). LCC are organized in a...

A two-level learning method for generalized multi-instance problems (2003)

Weidmann, Nils, Frank, Eibe, Pfahringer, Bernhard

In traditional multi-instance (MI) learning, a single positive instance in a bag produces a positive class label. Hence, the learner knows how the bags class label depends on the labels of the...

A two-level learning method for generalized multi-instance problems (2003)

Weidmann, Nils, Frank, Eibe, Pfahringer, Bernhard

In traditional multi-instance (MI) learning, a single positive instance in a bag produces a positive class label. Hence, the learner knows how the bags class label depends on the labels of the...

Visualizing class probability estimation (2003)

Eibe Frank, Mark Hall

Abstract. Inducing classiers that make accurate predictions on future data is a driving force for research in inductive learning. However, also of importance to the users is how to gain information...

Locally weighted naive Bayes (2003)

Eibe Frank, Mark Hall, Bernhard Pfahringer

Despite its simplicity, the naive Bayes classier has surprised machine learning researchers by exhibiting good performance on a variety of learning problems. Encouraged by these results, researchers...

Visualizing class probability estimation (2003)

Eibe Frank, Mark Hall

Inducing classifiers that make accurate predictions on future data is a driving force for research in inductive learning. However, also of importance to the users is how to gain information from the...

Applying propositional learning algorithms to multi-instance data (2003)

Eibe Frank, Xin Xu

Multi-instance learning is commonly tackled using special-purpose algorithms.

Logistic model trees (2003)

Niels Landwehr, Mark Hall, Eibe Frank

Abstract. Tree induction methods and linear models are popular techniques for supervised learning tasks, both for the prediction of nominal classes and numeric values. For predicting numeric...

A Two-level Learning Method for Generalized Multi-instance Problems (2003)

Nils Weidmann, Eibe Frank, Bernhard Pfahringer

In traditional multi-instance (MI) learning, a single positive instance in a bag produces a positive class label. Hence, the learner knows how the bag's class label depends on the labels of the...

Racing committees for large datasets. (2002)

Frank, Eibe, Holmes, Geoffrey, Kirkby, Richard, Hall, Mark A.

This paper proposes a method for generating classifiers from large datasets by building a committee of simple base classifiers using a standard boosting algorithm. It allows the processing of large...

Racing committees for large datasets. (2002)

Frank, Eibe, Holmes, Geoffrey, Kirkby, Richard, Hall, Mark A.

This paper proposes a method for generating classifiers from large datasets by building a committee of simple base classifiers using a standard boosting algorithm. It allows the processing of large...

A logic boosting approach to inducing multiclass alternating decision trees (2002)

Holmes, Geoffrey, Pfahringer, Bernhard, Kirkby, Richard, Frank, Eibe, Hall, Mark A.

The alternating decision tree (ADTree) is a successful classification technique that combine decision trees with the predictive accuracy of boosting into a ser to interpretable classification rules....

A logic boosting approach to inducing multiclass alternating decision trees (2002)

Holmes, Geoffrey, Pfahringer, Bernhard, Kirkby, Richard, Frank, Eibe, Hall, Mark A.

The alternating decision tree (ADTree) is a successful classification technique that combine decision trees with the predictive accuracy of boosting into a ser to interpretable classification rules....

Multiclass alternating decision trees (2002)

Holmes, Geoffrey, Pfahringer, Bernhard, Kirkby, Richard, Frank, Eibe, Hall, Mark A.

The alternating decision tree (ADTree) is a successful classification technique that combines decision trees with the predictive accuracy of boosting into a set of interpretable classification rules....

Multiclass alternating decision trees (2002)

Holmes, Geoffrey, Pfahringer, Bernhard, Kirkby, Richard, Frank, Eibe, Hall, Mark A.

The alternating decision tree (ADTree) is a successful classification technique that combines decision trees with the predictive accuracy of boosting into a set of interpretable classification rules....

Multiclass alternating decision trees (2002)

Bernhard Pfahringer, Richard Kirkby, Eibe Frank, Mark Hall

Abstract. The alternating decision tree (ADTree) is a successful classi cation technique that combines decision trees with the predictive accuracy of boosting into a set of interpretable classication...

Racing committees for large datasets (2002)

Eibe Frank, Richard Kirkby, Mark Hall

Abstract. This paper proposes a method for generating classiers from large datasets by building a committee of simple base classiers using a standard boosting algorithm. It permits the processing of...

Multiclass Alternating Decision Trees (2002)

Georey Holmes Bernhard, Bernhard Pfahringer, Richard Kirkby, Eibe Frank, Mark Hall

The alternating decision tree (ADTree) is a successful classification technique that combines decision trees with the predictive accuracy of boosting into a set of interpretable classification rules....

A simple approach to ordinal classification. (2001)

Frank, Eibe, Hall, Mark A.

Machine learning methods for classification problems commonly assume that the class values are unordered. However, in many practical applications the class values do exhibit a nature order, for...

A simple approach to ordinal classification. (2001)

Frank, Eibe, Hall, Mark A.

Machine learning methods for classification problems commonly assume that the class values are unordered. However, in many practical applications the class values do exhibit a nature order, for...

Determining progression in glaucoma using visual fields (2001)

Turpin, Andrew, Frank, Eibe, Hall, Mark A., Witten, Ian H., Johnson, Chris A.

The standardized visual field assessment, which measures visual function in 76 locations of the central visual area, is an important diagnostic tool in the treatment of the eye disease glaucoma. It...

Determining progression in glaucoma using visual fields (2001)

Turpin, Andrew, Frank, Eibe, Hall, Mark A., Witten, Ian H., Johnson, Chris A.

The standardized visual field assessment, which measures visual function in 76 locations of the central visual area, is an important diagnostic tool in the treatment of the eye disease glaucoma. It...

Interactive machine learning–letting users build classifiers (2000)

Ware, Malcolm, Frank, Eibe, Holmes, Geoffrey, Hall, Mark A., Witten, Ian H.

According to standard procedure, building a classifier is a fully automated process that follows data preparation by a domain expert. In contrast, interactive machine learning engages users in...

KEA: Practical automatic keyphrase extraction (2000)

Witten, Ian H., Paynter, Gordon W., Frank, Eibe, Gutwin, Carl, Nevill-Manning, Craig G.

Keyphrases provide semantic metadata that summarize and characterize documents. This paper describes Kea, an algorithm for automatically extracting keyphrases from text. Kea identifies candidate...

Interactive machine learning–letting users build classifiers (2000)

Ware, Malcolm, Frank, Eibe, Holmes, Geoffrey, Hall, Mark A., Witten, Ian H.

According to standard procedure, building a classifier is a fully automated process that follows data preparation by a domain expert. In contrast, interactive machine learning engages users in...

KEA: Practical automatic keyphrase extraction (2000)

Witten, Ian H., Paynter, Gordon W., Frank, Eibe, Gutwin, Carl, Nevill-Manning, Craig G.

Keyphrases provide semantic metadata that summarize and characterize documents. This paper describes Kea, an algorithm for automatically extracting keyphrases from text. Kea identifies candidate...

Text categorization using compression models (2000)

Frank, Eibe, Chui, Chang, Witten, Ian H.

Text categorization, or the assignment of natural language texts to predefined categories based on their content, is of growing importance as the volume of information available on the internet...

Text categorization using compression models (2000)

Frank, Eibe, Chui, Chang, Witten, Ian H.

Text categorization, or the assignment of natural language texts to predefined categories based on their content, is of growing importance as the volume of information available on the internet...

Pruning decision trees and lists / (2000)

Frank, Eibe.

Thesis (Ph.D.)--University of Waikato, 2000.

Naive Bayes for regression (2000)

Eibe Frank, Leonard Trigg, Geoffrey Holmes, Ian H. Witten, W. Aha

Abstract. Despite its simplicity, the naive Bayes learning scheme performs well on most classi-cation tasks, and is often signicantly more accurate than more sophisticated methods. Although the...

Interactive machine learning - letting users build classifiers (2000)

Malcolm Ware, Eibe Frank, Geoffrey Holmes, Mark Hall, Ian H. Witten

According to standard procedure, building a classier using machine learning is a fully automated process that follows the preparation of training data by a domain expert. In contrast, interactive...

Naive Bayes for regression (2000)

Eibe Frank, Leonard Trigg, Geoffrey Holmes, Ian H. Witten

Despite its simplicity, the naive Bayes learning scheme performs well on most classification tasks, and is often significantly more accurate than more sophisticated methods. Although the probability...

Text Categorization Using Compression Models (2000)

Eibe Frank, Chang Chui, Ian H. Witten

this article is more corn-like (one hesitates to say \corny") than most corn articles. A signicant number of articles belong to both categories. Figures 3.2i and 3.2j show an extreme situation...

KEA: Practical Automatic Keyphrase Extraction (2000)

Ian Witten Gordon, Gordon W. Paynter, Eibe Frank, Carl Gutwin

Keyphrases provide semantic metadata that summarize and characterize documents. This paper describes Kea, an algorithm for automatically extracting keyphrases from text. Kea identifies candidate...

Pruning Decision Trees and Lists (2000)

Eibe Frank

Machine learning algorithms are techniques that automatically build models describing the structure at the heart of a set of data. Ideally, such models can be used to predict properties of future...

Weka: Practical machine learning tools and techniques with Java implementations (1999)

Witten, Ian H., Frank, Eibe, Trigg, Len, Hall, Mark A., Holmes, Geoffrey, Cunningham, Sally Jo

The Waikato Environment for Knowledge Analysis (Weka) is a comprehensive suite of Java class libraries that implement many state-of-the-art machine learning and data mining algorithms. Weka is freely...

Weka: Practical machine learning tools and techniques with Java implementations (1999)

Witten, Ian H., Frank, Eibe, Trigg, Leonard E., Hall, Mark A., Holmes, Geoffrey, Cunningham, Sally Jo

The Waikato Environment for Knowledge Analysis (Weka) is a comprehensive suite of Java class libraries that implement many state-of-the-art machine learning and data mining algorithms. Weka is freely...

Reduced-error pruning with significance tests (1999)

Frank, Eibe, Witten, Ian H.

When building classification models, it is common practice to prune them to counter spurious effects of the training data: this often improves performance and reduces model size. “Reduced-error...

Reduced-error pruning with significance tests (1999)

Frank, Eibe, Witten, Ian H.

When building classification models, it is common practice to prune them to counter spurious effects of the training data: this often improves performance and reduces model size. “Reduced-error...

Generating rule sets from model trees (1999)

Holmes, Geoffrey, Hall, Mark A., Frank, Eibe

Knowledge discovered in a database must be represented in a form that is easy to understand. Small, easy to interpret nuggets of knowledge from data are one requirement and the ability to induce them...

Generating rule sets from model trees (1999)

Holmes, Geoffrey, Hall, Mark A., Frank, Eibe

Knowledge discovered in a database must be represented in a form that is easy to understand. Small, easy to interpret nuggets of knowledge from data are one requirement and the ability to induce them...

KEA: Practical Automatic Keyphrase Extraction (1999)

Witten, Ian H., Paynter, Gordon W., Frank, Eibe, Gutwin, Carl, Nevill-Manning, Craig G.

Keyphrases provide semantic metadata that summarize and characterize documents. This paper describes Kea, an algorithm for automatically extracting keyphrases from text. Kea identifies candidate...

Improving browsing in digital libraries with keyphrase indexes (1999)

Gutwin, Carl, Paynter, Gordon, Witten, Ian H., Nevill-Manning, Craig G., Frank, Eibe

Browsing accounts for much of people's interaction with digital libraries, but it is poorly supported by standard search engines. Conventional systems often operate at the wrong level, indexing words...

Market basket analysis of library circulation data (1999)

Cunningham, Sally Jo, Frank, Eibe

“Market Basket Analysis” algorithms have recently seen widespread use in analyzing consumer purchasing patterns-specifically, in detecting products that are frequently purchased together. We...

Making better use of global discretization (1999)

Frank, Eibe, Witten, Ian H.

Before applying learning algorithms to datasets, practitioners often globally discretize any numeric attributes. If the algorithm cannot handle numeric attributes directly, prior discretization is...

Domain-specific keyphrase extraction (1999)

Frank, Eibe, Paynter, Gordon W., Witten, Ian H., Gutwin, Carl, Nevill-Manning, Craig G.

Keyphrases are an important means of document summarization, clustering, and topic search. Only a small minority of documents have author-assigned keyphrases, and manually assigning keyphrases to...

Improving browsing in digital libraries with keyphrase indexes (1999)

Gutwin, Carl, Paynter, Gordon W., Witten, Ian H., Nevill-Manning, Craig G., Frank, Eibe

Browsing accounts for much of people's interaction with digital libraries, but it is poorly supported by standard search engines. Conventional systems often operate at the wrong level, indexing words...

Market basket analysis of library circulation data (1999)

Cunningham, Sally Jo, Frank, Eibe

“Market Basket Analysis” algorithms have recently seen widespread use in analyzing consumer purchasing patterns-specifically, in detecting products that are frequently purchased together. We...

Making better use of global discretization (1999)

Frank, Eibe, Witten, Ian H.

Before applying learning algorithms to datasets, practitioners often globally discretize any numeric attributes. If the algorithm cannot handle numeric attributes directly, prior discretization is...

Domain-specific keyphrase extraction (1999)

Frank, Eibe, Paynter, Gordon W., Witten, Ian H., Gutwin, Carl, Nevill-Manning, Craig G.

Keyphrases are an important means of document summarization, clustering, and topic search. Only a small minority of documents have author-assigned keyphrases, and manually assigning keyphrases to...

Generating rule sets from model trees (1999)

Mark Hall, Eibe Frank

Abstract. Model trees|decision trees with linear models at the leaf nodes|have recently emerged as an accurate method for numeric prediction that produces understandable models. However, it is known...

Market basket analysis of library circulation data (1999)

Sally Jo Cunningham, Eibe Frank

Abstract: "Market Basket Analysis " algorithms have recently seen widespread use in analyzing consumer purchasing patterns--specifically, in detecting products that are frequently...

KEA: Practical Automatic Keyphrase Extraction (1999)

Ian H. Witten, Gordon W. Paynter, Eibe Frank, Carl Gutwin

Keyphrases provide semantic metadata that summarize and characterize documents. This paper describes Kea, an algorithm for automatically extracting keyphrases from text. Kea identifies candidate...

KEA: Practical Automatic Keyphrase Extraction (1999)

Ian H. Witten, Gordon W. Paynter, Eibe Frank, Carl Gutwin

Keyphrases provide semantic metadata that summarize and characterize documents. This paper describes Kea, an algorithm for automatically extracting keyphrases from text. Kea identifies candidate...

Domain-Speci Keyphrase Extraction (1999)

Eibe Frank, Gordon W. Paynter, Ian H. Witten, Carl Gutwin

Keyphrases are an important means of document summarization, clustering, and topic search. Only a small minority of documents have author-assigned keyphrases, and manually assigning keyphrases to...

KEA: Practical Automatic Keyphrase Extraction (1999)

Ian H. Witten, Gordon W. Paynter, Eibe Frank, Carl Gutwin

Keyphrases provide semantic metadata that summarize and characterize documents. This paper describes Kea, an algorithm for automatically extracting keyphrases from text. Kea identifies candidate...

Generating rule sets from model trees (1999)

Mark Hall, Eibe Frank

Abstract. Model trees|decision trees with linear models at the leaf nodes|have recently emerged as an accurate method for numeric prediction that produces understandable models. However, it is known...

Domain-Specific Keyphrase Extraction (1999)

Eibe Frank, Gordon W. Paynter, Ian H. Witten, Carl Gutwin

Keyphrases are an important means of document summarization, clustering, and topic search. Only a small minority of documents have author-assigned keyphrases, and manually assigning keyphrases to...

Weka: Practical Machine Learning Tools and Techniques with Java Implementations (1999)

Ian H. Witten, Eibe Frank, Len Trigg, Mark Hall, Geoffrey Holmes, Sally Jo Cunningham

Introduction The Waikato Environment for Knowledge Analysis<E-174> (Weka) is a comprehensive suite of Java class<E-188> libraries that implement many state-of-the-art<E-145> machine...

Making Better Use of Global Discretization (1999)

Eibe Frank, Ian H. Witten

Before applying learning algorithms to datasets, practitioners often globally discretize any numeric attributes. If the algorithm cannot handle numeric attributes directly, prior discretization is...

KEA: Practical Automatic Keyphrase Extraction (1999)

Ian H. Witten, Gordon Paynter, Eibe Frank, Carl Gutwin

Keyphrases provide semantic metadata that summarize and characterize documents. This paper describes Kea, an algorithm for automatically extracting keyphrases from text. Kea identifies candidate...

Domain-Specific Keyphrase Extraction (1999)

Eibe Frank, Gordon W. Paynter, Ian H. Witten, Carl Gutwin

Keyphrases are an important means of document summarization, clustering, and topic search. Only a small minority of documents have author-assigned keyphrases, and manually assigning keyphrases to...

KEA: Practical Automatic Keyphrase Extraction (1999)

Ian H. Witten, Gordon W. Paynter, Eibe Frank, Carl Gutwin

Keyphrases provide semantic metadata that summarize and characterize documents. This paper describes Kea, an algorithm for automatically extracting keyphrases from text. Kea identifies candidate...

Naïve Bayes for regression (1998)

Frank, Eibe, Trigg, Leonard E., Holmes, Geoffrey, Witten, Ian H.

Despite its simplicity, the naïve Bayes learning scheme performs well on most classification tasks, and is often significantly more accurate than more sophisticated methods. Although the probability...

Naive Bayes for regression (1998)

Frank, Eibe, Trigg, Leonard E., Holmes, Geoffrey, Witten, Ian H.

Despite its simplicity, the naïve Bayes learning scheme performs well on most classification tasks, and is often significantly more accurate than more sophisticated methods. Although the probability...

Generating accurate rule sets without global optimization (1998)

Frank, Eibe, Witten, Ian H.

The two dominant schemes for rule-learning, C4.5 and RIPPER, both operate in two stages. First they induce an initial rule set and then they refine it using a rather complex optimization stage that...

Generating accurate rule sets without global optimization (1998)

Frank, Eibe, Witten, Ian H.

The two dominant schemes for rule-learning, C4.5 and RIPPER, both operate in two stages. First they induce an initial rule set and then they refine it using a rather complex optimization stage that...

Using a permutation test for attribute selection in decision trees (1998)

Frank, Eibe, Witten, Ian H.

Most techniques for attribute selection in decision trees are biased towards attributes with many values, and several ad hoc solutions to this problem have appeared in the machine learning...

Using a permutation test for attribute selection in decision trees (1998)

Frank, Eibe, Witten, Ian H.

Most techniques for attribute selection in decision trees are biased towards attributes with many values, and several ad hoc solutions to this problem have appeared in the machine learning...

Improving Browsing in Digital Libraries with Keyphrase Indexes (1998)

Carl Gutwin, Gordon W. Paynter, Ian H. Witten, Craig Nevill-manning, Eibe Frank

Browsing accounts for much of people's interaction with digital libraries, but it is poorly supported by standard search engines. Conventional systems often operate at the wrong level, indexing...

Reduced-Error Pruning With Significance Tests (1998)

Eibe Frank, Ian H. Witten

When building classification models, it is common practice to prune them to counter spurious effects of the training data: this often improves performance and reduces model size. "Reduced-error...

Generating Accurate Rule Sets Without Global Optimization (1998)

Eibe Frank, Ian H. Witten

The two dominant schemes for rule-learning, C4.5 and RIPPER, both operate in two stages. First they induce an initial rule set and then they refine it using a rather complex optimization stage that...

Technical Note (1998)

Using Model, Eibe Frank, Yong Wang, Geoffrey Holmes, Ian H. Witten

. Model trees, which are a type of decision tree with linear regression functions at the leaves, form the basis of a recent successful technique for predicting continuous numeric values. They can be...

Using a Permutation Test for Attribute Selection in Decision Trees (1998)

Eibe Frank, Ian H. Witten

Most techniques for attribute selection in decision trees are biased towards attributes with many values, and several ad hoc solutions to this problem have appeared in the machine learning...

Using Model Trees for Classification (1998)

Eibe Frank, Yong Wang, Suart Inglis, Geoffrey Holmes, Ian H. Witten, H. Witten

. Model trees, which are a type of decision tree with linear regression functions at the leaves, form the basis of a recent successful technique for predicting continuous numeric values. They can be...

E.: Generating Accurate Rule Sets Without Global Optimization (1998)

Eibe Frank

The two dominant schemes for rule-learning, C4.5 and RIPPER, both operate in two stages. First they induce an initial rule set and then they re ne it using a rather complex optimization stage that...

Using model trees for classification (1997)

Frank, Eibe, Wang, Yong, Inglis, Stuart J., Holmes, Geoffrey, Witten, Ian H.

Model trees, which are a type of decision tree with linear regression functions at the leaves, form the basis of a recent successful technique for predicting continuous numeric values. They can be...

Using model trees for classification (1997)

Frank, Eibe, Wang, Yong, Inglis, Stuart J., Holmes, Geoffrey, Witten, Ian H.

Model trees, which are a type of decision tree with linear regression functions at the leaves, form the basis of a recent successful technique for predicting continuous numeric values. They can be...

Using Model Trees for Classi cation (1997)

Eibe Frank, Yong Wang, Stuart Inglis, Geoffrey Holmes, Ian H. Witten, Raymond Mooney

Abstract. Model trees, which are a type of decision tree with linear regression functions at the leaves, form the basis of a recent successful technique for predicting continuous numeric values. They...

Selecting multiway splits in decision trees (1996)

Frank, Eibe, Witten, Ian H.

Decision trees in which numeric attributes are split several ways are more comprehensible than the usual binary trees because attributes rarely appear more than once in any path from root to leaf....

Selecting multiway splits in decision trees (1996)

Frank, Eibe, Witten, Ian H.

Decision trees in which numeric attributes are split several ways are more comprehensible than the usual binary trees because attributes rarely appear more than once in any path from root to leaf....

Active learning of soft rules for system modelling (1996)

Frank, Eibe, Huber, Klaus-Perter

Using rule learning algorithms to model systems has gained considerable interest in the past. The underlying idea of active learning is to learning algorithm influence the selection of training...

Active learning of soft rules for system modelling (1996)

Frank, Eibe, Huber, Klaus-Perter

Using rule learning algorithms to model systems has gained considerable interest in the past. The underlying idea of active learning is to learning algorithm influence the selection of training...