Conceptual Equivalence for Contrast Mining in Classification Learning (2009)
Ying Yang, Xindong Wu, Xingquan Zhu
Learning often occurs through comparing. In classification learning, in order to compare data groups, most existing methods compare either raw instances or learned classification rules against each...
Parameter Tuning for Induction Algorithm Oriented Feature Elimination (2009)
Abstract. This paper presents an analysis of parameter tuning for induction algorithm oriented feature elimination (IAOFE), an approach that takes into consideration not only the data and the target...
Error Detection and Impact-Sensitive Instance Ranking in Noisy Datasets (2009)
Xingquan Zhu, Xindong Wu, Ying Yang
Given a noisy dataset, how to locate erroneous instances and attributes and rank suspicious instances based on their impacts on the system performance is an interesting and important research issue....
User-Centered Biological Information Location by Combining User Profiles and Domain Knowledge (2008)
Xindong Wu, Jeffrey E. Stone, Marc Greenblatt
To aid researchers in obtaining, organizing and managing biological data, we have designed an intelligent digital library system that utilizes advanced data mining techniques. Our digital library...
Mining Video Associations for Efficient Database Management (2008)
To support more efficient video database management, this paper explores the concept of video association mining, with which the association patterns are characterized by sequentially associated...
A Semantic Network for Modeling Biological Knowledge in Multiple Databases (2008)
We have developed a semantic network of biological terminology to aid in the retrieval and integration of biological information from a variety of disparate information sources. Our semantic network...
DOI 10.1007/s10115-006-0016-8 REGULAR PAPER (2008)
Gong Chen, Xindong Wu, Xingquan Zhu
Efficient string matching with wildcards and length constraints
LARGE SCALE DATA MINING BASED ON DATA PARTITIONING (2008)
Shichao Zhang, Xindong Wu, S. Z Hang
Dealing with very large databases is one of the deÐning challenges in data mining research and development. Some databases are simply too large (e.g., with terabytes of data) to be processed at one...
Parameter Tuning for Induction Algorithm Oriented Feature Elimination (2008)
Abstract. This paper presents an analysis of parameter tuning for induction algorithm oriented feature elimination (IAOFE), an approach that takes into consideration not only the data and the target...
Guojun Mao, Xindong Wu, Xingquan Zhu, Gong Chen, Chunnian Liu
frequent itemsets
ELAPSED TIME IN HUMAN GAIT RECOGNITION: A NEW APPROACH (2008)
Dacheng Tao, Xuelong Li, Xindong Wu, Steve Maybank
Human gait is an effective biometric source for human identification and visual surveillance; therefore human gait recognition becomes to be a hot topic in recent research. However, the elapsed time...
Mining in Anticipation for Concept Change: Proactive-Reactive Prediction in Data Streams ⋆ (2008)
Ying Yang, Xindong Wu, Xingquan Zhu
Abstract. Prediction in streaming data is an important activity in the modern society. Two major challenges posed by data streams are (1) the data may grow without limit so that it is difficult to...
Xingquan Zhu, Xindong Wu, Ying Yang
Abstract. Recently, mining from data streams has become an important and challenging task for many real-world applications such as credit card fraud protection and sensor networking. One popular...
Guojun Mao, Xindong Wu, Xingquan Zhu, Gong Chen, Chunnian Liu, Guojun Mao, ...
Mining maximal frequent itemsets from data streams
An Empirical Study of the Noise Impact on Cost-Sensitive Learning (2008)
Xingquan Zhu, Xindong Wu, Taghi M. Khoshgoftaar, Yong Shi
In this paper, we perform an empirical study of the impact of noise on cost-sensitive (CS) learning, through observations on how a CS learner reacts to the mislabeled training examples in terms of...
Multi-database Mining, Shichao Zhang, Xindong Wu, Chengqi Zhang, Smart Distance, Information Systems, ...
Intelligence (TCCI) of the IEEE Computer Society deals with tools and systems using biologically and linguistically motivated computational paradigms such as artificial neural
Top 10 algorithms in data mining (2008)
Wu, Xindong, Kumar, Vipin, Quinlan, J. Ross, Ghosh, Joydeep, Yang, Qiang, Motoda, Hiroshi, ...
Yes
Top 10 algorithms in data mining (2008)
Wu, Xindong, Kumar, Vipin, Quinlan, J. Ross, Ghosh, Joydeep, Yang, Qiang, Motoda, Hiroshi, ...
Top 10 algorithms in data mining (2008)
Wu, Xindong, Kumar, Vipin, Quinlan, J. Ross, Ghosh, Joydeep, Yang, Qiang, Motoda, Hiroshi, ...
ICDM workshops 2008. Eight IEEE international conference on data mining workshops (2008)
Bonchi, Francesco, Giannotti, Fosca, Gunopoulos, Dimitrios, Turini, Franco, Zaniolo, Carlo, ...
Locating White Box Reuse via Data Mining (2007)
Margot Postema, Heinz Schmidt, Xindong Wu
A large percentage of white box reuse can occur within software systems. Once these reused components are located, they can be restructured to black box components. In structured systems, the black...
Honghua Dai, Kevin Korb, Chris Wallace, Xindong Wu
Weak causal relationships and small sample size pose two significant difficulties to the automatic discovery of causal models from observational data. This paper examines the influence of weak causal...
Feature Article: Multi-Database Mining 5 Multi-Database Mining (2007)
Shichao Zhang, Xindong Wu, Chengqi Zhang
Abstract — Multi-database mining is an important research area because (1) there is an urgent need for analyzing data in different sources, (2) there are essential differences between mono- and...
DigitalObjea IdealS9 (DOI) 10.1007/s00530-003-0076-5 Multime/s System (2003) (2007)
Multimesn System Springe, Xingquan Zhu, Jianping Fan, Ahmed K. Elmagarmid, Xindong Wu
Vide isincrexSTWWk the mere ofchoice for a varieP of communicationchanneca renneca primarily from incre8kw le vex ofneh orke multimeST systeme One way to keW ourheSw above the vide se is to provide...
DOI 10.1007/s10115-007-0114-2 SURVEY PAPER Top 10 algorithms in data mining (2007)
Xindong Wu, Vipin Kumar, J. Ross, Quinlan Joydeep, Ghosh Qiang Yang, Hiroshi Motoda, ...
Abstract This paper presents the top 10 data mining algorithms identified by the IEEE International Conference on Data Mining (ICDM) in December 2006: C4.5, k-Means, SVM, Apriori, EM, PageRank,...
Tao, Dacheng, Tang, Xiaoou, Li, Xuelong, Wu, Xindong
Relevance feedback schemes based on support vector machines (SVM) have been widely used in content-based image retrieval (CBIR). However, the performance of SVM-based relevance feedback is often poor...
Tao, Dacheng, Tang, Xiaoou, Li, Xuelong, Wu, Xindong
Relevance feedback schemes based on support vector machines (SVM) have been widely used in content-based image retrieval (CBIR). However, the performance of SVM-based relevance feedback is often poor...
Tao, Dacheng, Tang, Xiaoou, Li, Xuelong, Wu, Xindong
Relevance feedback schemes based on support vector machines (SVM) have been widely used in content-based image retrieval (CBIR). However, the performance of SVM-based relevance feedback is often poor...
Tao, Dacheng, Tang, Xiaoou, Li, Xuelong, Wu, Xindong
Relevance feedback schemes based on support vector machines (SVM) have been widely used in content-based image retrieval (CBIR). However, the performance of SVM-based relevance feedback is often poor...
S.: Human carrying status in visual surveillance (2006)
Dacheng Tao, Xuelong Li, Xindong Wu, Stephen J. Maybank
A person’s gait changes when he or she is carrying an object such as a bag, suitcase or rucksack. As a result, human identification and tracking are made more difficult because the averaged gait...
Mining sequential patterns across data streams (2005)
Gong Chen, Xindong Wu, Xingquan Zhu
Abstract. There are extensive endeavors toward mining frequent items or itemsets in a single data stream, but rare efforts have been made to explore sequential patterns among literals in different...
Supervised tensor learning (2005)
Dacheng Tao, Xuelong Li, Weiming Hu, Stephen Maybank, Xindong Wu
This paper aims to take general tensors as inputs for supervised learning. A supervised tensor learning (STL) framework is established for convex optimization based learning techniques such as...
Sequential pattern mining in multiple streams (2005)
Gong Chen, Xindong Wu, Xingquan Zhu
In this paper, we deal with mining sequential patterns in multiple data streams. Building on a state-of-the-art sequential pattern mining algorithm PrefixSpan for mining transaction databases, we...
A decremental algorithm for maintaining frequent itemsets in dynamic databases (2005)
Shichao Zhang, Xindong Wu, Jilian Zhang, Chengqi Zhang
Abstract. Data mining and machine learning must confront the problem of pattern maintenance because data updating is a fundamental operation in data management. Most existing data-mining algorithms...
Supervised tensor learning (2005)
Dacheng Tao, Xuelong Li, Weiming Hu, Stephen Maybank, Xindong Wu
This paper aims to take general tensors as inputs for supervised learning. A supervised tensor learning (STL) framework is established for convex optimization based learning techniques such as...
Xingquan Zhu, Xindong Wu, Jianping Fan, Ahmed K. Elmagarmid, Walid G. Aref
Abstract. In this paper, we propose a hierarchical video summarization strategy that explores video content structure to provide the users with a scalable, multilevel video summary. First,...
Dynamic Classifier Selection for Effective Mining from Noisy Data Streams (2004)
Xingquan Zhu, Xindong Wu, Ying Yang
Recently, mining from data streams has become an important and challenging task for many real-world applications such as credit card fraud protection and sensor networking. One popular solution is to...
Dealing with predictive-but-unpredictable attributes in noisy data sources (2004)
Ying Yang, Xindong Wu, Xingquan Zhu
Abstract. Attribute noise can affect classification learning. Previous work in handling attribute noise has focused on those predictable attributes that can be predicted by the class and other...
Cost-guided Class Noise Handling for Effective Cost-sensitive Learning (2004)
Recent research in machine learning, data mining and related areas has produced a wide variety of algorithms for costsensitive (CS) classification, where instead of maximizing the classification...
Data mining: How research meets practical development (2003)
Xindong Wu, Philip S. Yu, Gregory Piatetsky-shapiro, Nick Cercone, T. Y. Lin, Ramamohanarao Kotagiri, ...
Eliminating class noise in large datasets (2003)
Xingquan Zhu, Xindong Wu, Qijun Chen
This paper presents a new approach for identifying and eliminating mislabeled instances in large or distributed datasets. We first partition a dataset into subsets, each of which is small enough to...
Eliminating class noise in large datasets (2003)
Xingquan Zhu, Xindong Wu, Qijun Chen
This paper presents a new approach for identifying and eliminating mislabeled instances in large or distributed datasets. We first partition a dataset into subsets, each of which is small enough to...
Association analysis with one scan of databases (2002)
Hao Huang, Xindong Wu, Richard Relue
Mining frequent patterns with an FP-tree avoids costly candidate generation and repeatedly occurrence frequency checking against the support threshold. It therefore achieves better performance and...
Building Intelligent Learning Database Systems (2000)
Induction and deduction are two opposite operations in data mining applications. Induction extracts knowledge in the form of, say, rules or decision trees from existing data, and deduction applies...
Aggregation of Association Rules (1999)
Dealing with very large databases is one of the defining challenges in data mining research and development. Some databases are simply too large (e.g., with terabytes of data) to be processed at one...
Association Analysis with One Scan of Databases (1998)
Huang, Hao, Wu, Xindong, Relue, Richard
Mining frequent patterns with an FP-tree avoids costly candidate generation and repeatedly occurrence frequency checking against the support threshold. It therefore achieves better performance and...
Multi-Layer Incremental Induction (1998)
. This paper describes a multi-layer incremental induction algorithm, MLII, which is linked to an existing nonincremental induction algorithm to learn incrementally from noisy data. MLII makes use of...
Rule Induction with Extension Matrices (1998)
This paper presents a heuristic, attribute-based, noise-tolerant data mining program, HCV (Version 2.0), based on the newly-developed extension matrix approach. By dividing the positive examples (PE)...
A Decision Support Tool for Tuning Parameters in a Machine Learning Algorithm (1997)
Margot Postema, Tim Menzies, Xindong Wu
Many machine learning algorithms require parameter tuning in order to adapt them to the particulars of a training set. This tuning task can be an expert task in its own right. Based on our...
A Study of Causal Discovery With Weak Links and Small Samples (1997)
Honghua Dai, Kevin Korb, Chris Wallace, Xindong Wu
Weak causal relationships and small sample size pose two significant difficulties to the automatic discovery of causal models from observational data. This paper examines the influence of weak causal...
Object-Oriented Modeling of Rule-Based Programming (1997)
A domain expertise always comprises a set of concepts and the logical relationships between them. In rule-based programming, rules which describe logical relationships are the fundamental knowledge...
A Decision Support Tool for Tuning Parameters in a Machine Learning Algorithm (1997)
Margot Postema, Tim Menzies, Xindong Wu
Many machine learning algorithms require parameter tuning in order to adapt them to the particulars of a training set. This tuning task can be an expert task in its own right. Based on our...
A Study of Causal Discovery With Weak Links and Small Samples (1997)
Honghua Dai, Kevin Korb, Chris Wallace, Xindong Wu
Weak causal relationships and small sample size pose two significant difficulties to the automatic discovery of causal models from observational data. This paper examines the influence of weak causal...
The Use and Acquisition of Explicit Ontologies in KEshell (1996)
Schematic descriptions of a domain knowledge, called an ontology [van Heijst et al. 96], are very useful in facilitating and formalising the knowledge acquisition process in knowledgebased systems...
Noise Handling with Extension Matrices (1996)
Xindong Wu, Johan Krisár, Petter Mahlén
HCV is a heuristic attribute-based induction algorithm based on the newly-developed extension matrix approach. By dividing the positive examples (PE) of a specific class in a given example set into...
A Tuning Aid to Improve Deduction of Induction Results (1996)
Margot Postema, Xindong Wu, Tim Menzies
This paper examines where a tuning aid can be useful to improve deduction of induction results. Different discretization methods use different strategies to set up the borders for continuous...
Noise Handling With Extension Matrices (1996)
HCV is a heuristic attribute-based induction algorithm based on the newly-developed extension matrix approach. By dividing the positive examples (PE) of a specific class in a given example set into...
A Bayesian Discretizer for Real-Valued Attributes (1996)
Discretization of real-valued attributes into nominal intervals has been an important area for symbolic induction systems because many real world classification tasks involve both symbolic and...
A Comparison of Objects with Frames and OODBs (1995)
Objects and frames are two powerful technologies used in Software Engineering and Artificial Intelligence respectively. Object-oriented databases are currently one of the most important research...
Noise Handling with Extension Matrixes (1995)
Xindong Wu, Johan Krisár, Petter Mahlén
HCV is a heuristic attribute-based induction algorithm based on the newly-developed extension matrix approach. By dividing the positive examples (PE) of a specific class in a given example set into...
Xindong Wu, Sita Ramakrishnan, Heinz Schmidt
ion and encapsulation Abstraction is the principle of capturing useful information by ignoring all the detailed features of an entity that are not relevant to understanding what it does or what it...
SIKT: A Structured Interactive Knowledge Transfer Program (1995)
Facilitating and formalising the process of specification acquisition in software development is an important problem in the automatic generation of software. This paper presents a structured...
Xindong Wu, Sita Ramakrishnan, Heinz Schmidt
One of the fundamental differences between AI research and conventional computer science (such as software engineering and database technology) is that AI has its own established programming...
Fuzzy Interpretation of Induction Results (1995)
When applying rules induced from training examples to a test example, there are three possible cases which demand different actions: (1) no match, (2) single match, and (3) multiple match. Existing...
Rule Schema + Rule Body: A 2-Level Representation Language (1994)
This paper presents an alternative representation language, rule schema + rule body, to rule-based production systems based on an integration of rule-based and numeric computations. Rule schemata in...
Extracting Rule Schemas from Rules for an Intelligent Learning Database System (1994)
A software module for extracting rule schemas from rules, in the context of an intelligent learning data base system (ILDB), is described. The ILDB system employs a two level knowledge representation...
Knowledge Acquisition from Data Bases (1993)
Knowledge acquisition from databases is a research frontier for both data base technology and machine learning (ML) techniques,and has seen sustained research over recent years.It also acts as a link...
Knowledge Acquisition from Data Bases (1993)
Knowledge acquisition from databases is a research frontier for both data base technology and machine learning (ML) techniques,and has seen sustained research over recent years.It also acts as a link...
10 CHALLENGING PROBLEMS IN DATA MINING RESEARCH
In October 2005, we took an initiative to identify 10 challenging problems in data mining research, by consulting some of the most active researchers in data mining and machine learning for their...