Jiawei Han

Publication List Details

Period

0000 - 2009

Number

382

Co-Authors

Proc. VLDB 02 Multi-Dimensional Regression Analysis of Time-Series Data Streams\Lambda (2009)

Yixin Chen, Guozhu Dong, Jiawei Han, Benjamin W. Wah, Jianyong Wang

Abstract Real-time production systems and other dynamic environments often generate tremendous (potentially infinite) amount of stream data; the volume of data is too huge to be stored on disks or...

Real-time Knowledge Discovery and Dissemination for Intelligence Analysis Bhavani Thuraisingham, (2009)

Latifur Khan, Murat Kantarcioglu, Sonia Chib, Jiawei Han, Sang Son

This paper describes the issues and challenges for real-time knowledge discovery and then discusses approaches and challenges for real-time data mining and stream mining. Our goal is to extract...

Abstract SpaRClus: Spatial Relationship Pattern-Based Hierarchical Clustering ∗ (2009)

Sangkyum Kim, Xin Jin, Jiawei Han

For the past decade, the need of multimedia mining has increased tremendously, especially in image data due to inexpensive digital technologies and fast mounting of image data. In this paper, we,...

TraClass: Trajectory Classification Using Hierarchical Region-Based and Trajectory-Based Clustering ∗ ABSTRACT (2009)

Jae-gil Lee, Jiawei Han, Xiaolei Li, Hector Gonzalez

Trajectory classification, i.e., model construction for predicting the class labels of moving objects based on their trajectories and other features, has many important, real-world applications. A...

Abstract Plan Mining by Divide-and-Conquer (2009)

Jiawei Han, Qiang Yang, Edward Kim

Plans or sequences of actions are an important form of data. With the proliferation of database technology, plan databases (or planbases) are increasingly common. E cient discovery of important...

Data Mining: Concepts and Techniques (2009)

Rohan Sharma, Kalpit Shah, Yeshesvini Shirahatti, Smruti Patel, Rohan Sharma, Kalpit Shah, ...

• Introducing the concept of a warehouse, modeling of data and schemas used.

Mining Frequent Patterns from Very High Dimensional Data: A Top-Down Row Enumeration Approach * (2009)

Hongyan Liu, Jiawei Han, Dong Xin, Zheng Shao

Data sets of very high dimensionality, such as microarray data, pose great challenges on efficient processing to most existing data mining algorithms. Recently, there comes a row-enumeration method...

Semantic Annotation of Frequent Patterns (2008)

Qiaozhu Mei, Dong Xin, Hong Cheng, Jiawei Han, Chengxiang Zhai

Using frequent patterns to analyze data has been one of the fundamental approaches in many data mining applications. Research in frequent pattern mining has so far mostly focused on developing...

Regression cubes with lossless compression and aggregation (2008)

Yixin Chen, Guozhu Dong, Senior Member, Jiawei Han, Senior Member, Benjamin W. Wah, ...

Abstract—As OLAP engines are widely used to support multidimensional data analysis, it is desirable to support in data cubes advanced statistical measures, such as regression and filtering, in...

Accelerating DNA Sequencing-by-Hybridization with Noise (2008)

Chen Chen, Dong Xin, Jiawei Han

As a potential alternative to current wet-lab technologies, DNA sequencing-by-hybridization (SBH) has received much attention from different research communities. In order to deal with real...

ABSTRACT SOBER: Statistical Model-based Bug Localization ∗ (2008)

Chao Liu, Jiawei Han, Xifeng Yan

Automated localization of software bugs is one of the essential issues in debugging aids. Previous studies indicated that the evaluation history of program predicates may disclose important clues...

1 Warehousing and Mining Massive RFID Data Sets (2008)

Jiawei Han

Themes on Advance Data Mining Applications � Mining sequences and graphs for biological data analysis � Web mining and social network analysis � Stream and sensor data mining � Mining moving...

of Excellence/Institute for Robotic and Intelligent Systems, and the Research Grants Council of the Hong Kong Special (2008)

Ke Wang, Yuelong Jiang, Jeffrey Xu Yu, Guozhu Dong, Jiawei Han

The iceberg cube mining computes all cells v, corresponding to GROUP BY partitions, that satisfy a given constraint on aggregated behaviors of the tuples in a GROUP BY partition. The number of cells...

gPrune: A Constraint Pushing Framework for Graph Pattern Mining (2008)

Feida Zhu, Xifeng Yan, Jiawei Han, Philip S. Yu

Abstract. In graph mining applications, there has been an increasingly strong urge for imposing user-specified constraints on the mining results. However, unlike most traditional itemset constraints,...

(Currently on leave from Concordia U.) (2008)

Raymond Ng, Jiawei Han, Simon Fraser U

Currently, there is tremendous interest in providing ad-hoc mining capabilities in database management systems. As a rst step towards this goal, in [15] we proposed an architecture for supporting...

Efficient Processing of Ranked Queries with Sweeping Selection ⋆ (2008)

Wen Jin, Martin Ester, Jiawei Han

Abstract. Existing methods for top-k ranked query employ techniques including sorting, updating thresholds and materializing views. In this paper, we propose two novel index-based techniques for...

Abstract SpaRClus: Spatial Relationship Pattern-Based Hierarchical Clustering ∗ (2008)

Sangkyum Kim, Xin Jin, Jiawei Han

For the past decade, the need of multimedia mining has increased tremendously, especially in image data due to inexpensive digital technologies and fast mounting of image data. In this paper, we,...

and (2008)

Wei Lu, Jiawei Han, Beng Chin Ooi

Extraction of interesting and general knowledge from large spatial databases is an important task in the development of spatial data- and knowledge-base systems. In this paper, we investigate...

Locality sensitive discriminant analysis (2008)

Deng Cai, Jiawei Han, Xiaofei He, Kun Zhou, Hujun Bao

Linear Discriminant Analysis (LDA) is a popular data-analytic tool for studying the class relationship between data points. A major disadvantage of LDA is that it fails to discover the local...

Near-optimal supervised feature selection among frequent subgraphs (2008)

Thoma, Marisa, Cheng, Hong, Gretton, Arthur, Han, Jiawei, Kriegel, Hans-Peter, Smola, Alex, ...

Graph classification is an increasingly important step in numerous application domains, such as function prediction of molecules and proteins, computerised scene analysis, and anomaly detection in...

Preface (2008)

Jiawei Han, Micheline Kamber

Our capabilities of both generating and collecting data have been increasing rapidly in the last several decades. Contributing factors include the widespread use of bar codes for most commercial...

On Efficient Processing of Subspace Skyline Queries on High Dimensional Data (2008)

Wen Jin, Martin Ester, Jiawei Han

Recent studies on efficiently answering subspace skyline queries can be separated into two approaches. The first focused on pre-materializing a set of skylines points in various subspaces while the...

Approximate Frequent Pattern Mining (2008)

Philip S. Yu, Xifeng Yan, Jiawei Han, Hong Cheng, Feida Zhu

Frequent pattern mining has been a focused theme in data mining research and an important first step in the analysis of data arising in a broad range of applications. The traditional exact model for...

Abstract (2008)

Ming-syan Chen, Jiawei Han, Philip S. Yu

Mining information and knowledge from large databases has been recognized by many researchers as a key research topic in database systems and machine learning, and by many industrial companies as an...

Mining for Information Discovery on the Web: Overview and Illustrative Research (2008)

Hwanjo Yu, Anhai Doan, Jiawei Han

Summary. The Web has become a fertile ground for numerous research activities in mining. In this chapter we discuss research on finding targeted information on the Web. First, we briefly survey the...

Bibliographic Notes for Chapter 5 Mining Frequent Patterns, Associations, and Correlations (2008)

Jiawei Han, Micheline Kamber, Morgan Kaufmann Publishers

discussed in Section 5.2.1 for frequent itemset mining was presented in Agrawal and Srikant [AS94b]. A variation of the algorithm using a similar pruning heuristic was developed independently by...

On Compressing Frequent Patterns ⋆ (2008)

Dong Xin, Jiawei Han, Xifeng Yan, Hong Cheng

A major challenge in frequent-pattern mining is the sheer size of its mining results. To compress the frequent patterns, we propose to cluster frequent patterns with a tightness measure δ (called...

� Rule Generation � Negative Tuple Sampling � Performance Study (2008)

Xiaoxin Yin, Jiawei Han, Jiong Yang, Philip S. Yu

Classification � Most real-world data are stored in relational databases � A relational database usually contains multiple, semantically inter-connected relations � To classify objects in one...

Abstract (2008)

Ming-syan Chen, Jiawei Han, Philip S. Yu

Mining information and knowledge from large databases has been recognized by many researchers as a key research topic in database systems and machine learning, and by many industrial companies as an...

and (2008)

Wei Lu, Jiawei Han, Beng Chin Ooi

Extraction of interesting and general knowledge from large spatial databases is an important task in the development of spatial data- and knowledge-base systems. In this paper, we investigate...

Feature-based similarity search in graph structures (2008)

Xifeng Yan, Feida Zhu, Philip S. Yu, Jiawei Han

Similarity search of complex structures is an important operation in graph-related applications since exact matching is often too restrictive. In this article, we investigate the issues of...

Abstract GeoMiner: A System Prototype for Spatial Data Mining (2008)

Jiawei Han, Krzysztof Koperski, Nebojsa Stefanovic

Spatial data mining is to mine high-level spatial information and knowledge from large spatial databases. A spatial data mining system prototype, GeoMiner, has been designed and developed based on...

Document Clustering Using Locality Preserving ∗ corresponding author Indexing (2008)

Deng Cai, Xiaofei He, Jiawei Han

1 We propose a novel document clustering method, which aims to cluster the docu-ments into different semantic classes. The document space is generally of high dimen-sionality, and clustering in such...

Closed Constrained Gradient Mining in Retail Databases (2008)

Jianyong Wang, Jiawei Han, Senior Member, Jian Pei

Abstract—Incorporating constraints into frequent itemset mining not only improves data mining efficiency, but also leads to concise and meaningful results. In this paper, a framework for closed...

Data Cube Computation and Data Generalization (2008)

Jiawei Han, Micheline Kamber, Morgan Kaufmann Publishers

Gray, Chauduri, Bosworth, et al. [GCB + 97] proposed the data cube as a relational aggregation operator generalizing group-by, crosstabs, and subtotals. Harinarayan, Rajaraman, and Ullman [HRU96]...

Bibliographic Notes for Chapter 9 Graph Mining, Social Network Analysis, and Multirelational Data Mining (2008)

Jiawei Han, Micheline Kamber

Research into graph mining has developed many frequent subgraph mining methods. Washio and Motoda [WM03] performed a survey on graph-based data mining. Many well-known pair-wise isomorphism testing...

M (2006): Data Mining: Concepts and Techniques (2 nd edition (2008)

Jiawei Han, Micheline Kamber, Morgan Kaufmann Publishers

Thomsen [Tho97]. Chaudhuri and Dayal [CD97] provide a general overview of data warehousing and OLAP technology. A set of research papers on materialized views and data warehouse implementations were...

ABSTRACT Extracting Redundancy-Aware Top-K Patterns ∗ (2008)

Dong Xin, Hong Cheng, Xifeng Yan, Jiawei Han

Observed in many applications, there is a potential need of extracting a small set of frequent patterns having not only high significance but also low redundancy. The significance is usually defined...

Fisher Analysis (MFA) and Local Discriminant Embedding (2008)

Deng Cai, Xiaofei He, Yuxiao Hu, Jiawei Han, Thomas Huang

Subspace learning based face recognition methods have attracted considerable interests in recently years, including

Abstract Exploratory Mining via Constrained Frequent Set Queries (2008)

Raymond Ng, Concordia U, Jiawei Han, Simon Fraser U, Teresa Mah

Although there have been many studies on data mining, to date there have been few research prototypes or commercial systems supporting comprehensive query-driven mining, which encourages interactive...

cross-species microarray (2008)

Fei Pan, Kiran Kamath, Haiyan Hu, Yu Huang, Kangyu Zhang, Min Xu, ...

package for integrative analysis of cross-platform and

Motion-Alert: Automatic Anomaly Detection in Massive Moving Objects ⋆ (2008)

Xiaolei Li, Jiawei Han, Sangkyum Kim

Abstract. With recent advances in sensory and mobile computing technology, enormous amounts of data about moving objects are being collected. With such data, it becomes possible to automatically...

Discovering Evolutionary Classifier over High Speed Non-static Stream Abstract (2008)

Jiong Yang, Xifen Yan, Jiawei Han, Wei Wang

With the emergence of large-volume and high-speed streaming data, mining data streams has become a focus of increasing interests. The major new challenges in streaming data mining are as follows: (1)...

M (2006): Data Mining: Concepts and Techniques (2 nd edition (2008)

Jiawei Han, Micheline Kamber, Morgan Kaufmann Publishers

The book Knowledge Discovery in Databases, edited by Piatetsky-Shapiro and Frawley [PSF91], is an early collection of research papers on knowledge discovery from data. The book Advances in Knowledge...

Data Mining: Concepts and Techniques — Slides for Textbook — — Chapter 1 — (2008)

Jiawei Han, Micheline Kamber

� Motivation: Why data mining? � What is data mining? � Data Mining: On what kind of data? � Data mining functionality

Efficient Classification from Multiple Heterogeneous Databases ⋆ (2008)

Xiaoxin Yin, Jiawei Han

Abstract. With the fast expansion of computer networks, it is inevitable to study data mining on heterogeneous databases. In this paper we propose MDBM, an accurate and efficient approach for...

Bibliographic Notes for Chapter 2 Data Preprocessing (2008)

Jiawei Han, Micheline Kamber, Morgan Kaufmann Publishers

are given below. Methods for descriptive data summarization have been studied in the statistics literature long before the onset of computers. Good summaries of statistical descriptive data mining...

Efficient Processing of Ranked Queries with Sweeping Selection ⋆ (2008)

Wen Jin, Martin Ester, Jiawei Han

Abstract. Existing methods for top-k ranked query employ techniques including sorting, updating thresholds and materializing views. In this paper, we propose two novel index-based techniques for...

and Retrieval—Relevance feedback (2008)

Deng Cai, Xiaofei He, Jiawei Han

Recently, there have been considerable interests in geometric-based methods for image retrieval. These methods consider the image space as a smooth manifold and apply manifold learning techniques to...

Mining Stream, Time-Series, and Sequence Data (2008)

Jiawei Han, Micheline Kamber, Morgan Kaufmann Publishers

Stream data mining research has been active in recent years. Popular surveys on stream data systems and stream data processing include Babu and Widom [BW01], Babcock, Babu, Datar, et al. [BBD + 02],...

M (2006): Data Mining: Concepts and Techniques (2 nd edition (2008)

Jiawei Han, Micheline Kamber, Morgan Kaufmann Publishers

Mining complex types of data has been a fast developing, popular research field, with many research papers and tutorials appearing in conferences and journals on data mining and database systems....

Making SVMs scalable to large data sets using hierarchical cluster indexing (2008)

Hwanjo Yu, Jiong Yang, Jiawei Han, Xiaolei Li

Support vector machines (SVMs) have been promising methods for classification and regression analysis due to their solid mathematical foundations, which include two desirable properties: margin...

∗ corresponding author Orthogonal Laplacianfaces for Face Recognition (2008)

Deng Cai, Xiaofei He, Jiawei Han, Acm Fellow

1 Following the intuition that the naturally occurring face data may be generated by sampling a probability distribution that has support on or near a sub-manifold of ambient space, we propose an...

DOI 10.1007/s10115-003-0133-6 Springer-Verlag London Ltd. © 2004 Knowledge and Information Systems (2004) Mining Condensed Frequent-Pattern Bases ⋆ (2008)

Jian Pei, Guozhu Dong, Wei Zou, Jiawei Han

Abstract. Frequent-pattern mining has been studied extensively and has many useful applications. However, frequent-pattern mining often generates too many patterns to be truly efficient or effective....

Traffic Density-Based Discovery of Hot Routes in Road Networks ⋆ (2008)

Xiaolei Li, Jiawei Han, Jae-gil Lee, Hector Gonzalez

Abstract. Finding hot routes (traffic flow patterns) in a road network is an important problem. They are beneficial to city planners, police departments, real estate developers, and many others....

ABSTRACT Image Clustering with Tensor Representation ∗ (2008)

Xiaofei He, Deng Cai, Haifeng Liu, Jiawei Han

We consider the problem of image representation and clustering. Traditionally, an n1 × n2 image is represented by a vector in the Euclidean space R n1×n2. Some learning algorithms are then applied...

Bibliographic Notes for Chapter 11 (2008)

Jiawei Han, Micheline Kamber

Many books discuss applications of data mining. For financial data analysis and financial modeling, see Benninga and Czaczkes [BC00] and Higgins [Hig03]. For retail data mining and customer...

Abstract (2008)

Chao Liu, Jiawei Han, Yu Zhang, Xiangyu Zhang, Bharat K. Bhargava

Recent software systems usually feature an automated failure reporting component, with which a huge number of failures are collected from software end-users. With a proper support of failure...

Data Mining: Concepts and Techniques — Slides for Textbook — — Chapter 5 — (2008)

Jiawei Han, Micheline Kamber

� Descriptive vs. predictive data mining � Descriptive mining: describes concepts or task-relevant data sets in concise, summarative, informative, discriminative forms � Predictive mining:...

Frequent Closed Sequence Mining without Candidate Maintenance (2008)

Jianyong Wang, Senior Member, Jiawei Han, Senior Member, Chun Li

Abstract—Previous studies have presented convincing arguments that a frequent pattern mining algorithm should not mine all frequent patterns but only the closed ones because the latter leads to not...

The Multi-Relational Skyline Operator (2008)

Wen Jin, Martin Ester, Zengjian Hu, Jiawei Han

Most of the existing work on skyline query has been extensively used in decision support, recommending systems etc, and mainly focuses on the efficiency issue for a single table. However the data...

Data Management and Exploration (2008)

Jiawei Han, Micheline Kamber, Prof Dr, Thomas Seidl, Prof Dr, Thomas Seidl, ...

� Finding all the patterns autonomously in a database? — unrealistic because the patterns could be too many but uninteresting � Data mining should be an interactive process � User directs...

Bibliographic Notes for Chapter 6 Classification and Prediction (2008)

Jiawei Han, Micheline Kamber, Morgan Kaufmann Publishers

books describe each of the basic methods of classification discussed in this chapter, as well as practical techniques for the evaluation of classifier performance. Edited collections containing...

gApprox: Mining Frequent Approximate Patterns from a Massive Network (2008)

Chen Chen, Xifeng Yan, Feida Zhu, Jiawei Han

Recently, there arise a large number of graphs with massive sizes and complex structures in many new applications, such as biological networks, social networks, and the Web, demanding powerful data...

On Appropriate Assumptions to Mine Data Streams: Analysis and Practice (2008)

Jing Gao, Wei Fan, Jiawei Han

Recent years have witnessed an increasing number of studies in stream mining, which aim at building an accurate model for continuously arriving data. Somehow most existing work makes the implicit...

Searching Substructures with Superimposed Distance ∗ (2008)

Xifeng Yan, Feida Zhu, Jiawei Han, Philip S. Yu

Efficient indexing techniques have been developed for the exact and approximate substructure search in large scale graph databases. Unfortunately, the retrieval problem of structures with categorical...

Abstract Mining Behavior Graphs for “Backtrace ” of Noncrashing Bugs ∗ (2008)

Chao Liu, Xifeng Yan, Hwanjo Yu, Jiawei Han, Philip S. Yu

Analyzing the executions of a buggy software program is essentially a data mining process. Although many interesting methods have been developed to trace crashing bugs (such as memory violation and...

Mining Frequent Patterns from Very High Dimensional Data: A Top-Down Row Enumeration Approach * (2008)

Hongyan Liu, Jiawei Han, Dong Xin, Zheng Shao

Data sets of very high dimensionality, such as microarray data, pose great challenges on efficient processing to most existing data mining algorithms. Recently, there comes a row-enumeration method...

Regression cubes with lossless compression and aggregation (2008)

Yixin Chen, Guozhu Dong, Senior Member, Jiawei Han, Senior Member, Benjamin W. Wah, ...

Abstract—As OLAP engines are widely used to support multidimensional data analysis, it is desirable to support in data cubes advanced statistical measures, such as regression and filtering, in...

ABSTRACT SOBER: Statistical Model-based Bug Localization ∗ (2008)

Chao Liu, Jiawei Han, Xifeng Yan

Automated localization of software bugs is one of the essential issues in debugging aids. Previous studies indicated that the evaluation history of program predicates may disclose important clues...

Abstract (2008)

Xiaoxin Yin, Jiawei Han

Different people or objects may share identical names in the real world, which causes confusion in many applications. It is a nontrivial task to distinguish those objects, especially when there is...

Trajectory Outlier Detection: A Partition-and-Detect Framework (2008)

Jae-gil Lee, Jiawei Han, Xiaolei Li

Abstract — Outlier detection has been a popular data mining task. However, there is a lack of serious study on outlier detection for trajectory data. Even worse, an existing trajectory outlier...

ABSTRACT Extracting Redundancy-Aware Top-K Patterns ∗ (2008)

Dong Xin, Hong Cheng, Xifeng Yan, Jiawei Han

Observed in many applications, there is a potential need of extracting a small set of frequent patterns having not only high significance but also low redundancy. The significance is usually defined...

Mining Evolving Customer-Product Relationships in Multi-Dimensional Space ∗ (2008)

Xiaolei Li, Jiawei Han, Xiaoxin Yin, Dong Xin

Previous work on mining transactional database has focused primarily on mining frequent itemsets, association rules, and sequential patterns. However, interesting relationships between customers and...

Association Mining in Large Databases: A Re-Examination of Its Measures ⋆ (2008)

Tianyi Wu, Yuguo Chen, Jiawei Han

Abstract. In the literature of data mining and statistics, numerous interestingness measures have been proposed to disclose succinct object relationships of association patterns. However, it is still...

Efficient Discovery of Frequent Approximate Sequential Patterns (2008)

Feida Zhu, Xifeng Yan, Jiawei Han, Philip S. Yu

We propose an efficient algorithm for mining frequent approximate sequential patterns under the Hamming distance model. Our algorithm gains its efficiency by adopting a “break-down-and-build-up ”...

Efficient Multi-relational Classification by Tuple ID Propagation (2008)

Xiaoxin Yin, Jiawei Han, Jiong Yang

Abstract. Most of today’s structured data is stored in relational databases. In contrast, most classification approaches only apply on single “flat ” data relations. And it is usually difficult...

On Efficient Processing of Subspace Skyline Queries on High Dimensional Data (2008)

Wen Jin, Martin Ester, Jiawei Han

Recent studies on efficiently answering subspace skyline queries can be separated into two approaches. The first focused on pre-materializing a set of skylines points in various subspaces while the...

Noname manuscript No. (will be inserted by the editor) Mining Frequent Itemsets Over Arbitrary Time Intervals in Data Streams (2008)

Chris Giannella, Jiawei Han, Edward Robertson, Chao Liu

Abstract Mining frequent itemsets over a stream of transactions presents di cult new challenges over traditional mining in static transaction databases. Stream transactions can only be looked at once...

VLDB'03 Paper ID: 312 (2008)

Framework For Clustering, Charu C. Aggarwal, Jiawei Han, Jianyong Wang, Philip S. Yu, Charu C. Aggarwal, ...

The clustering problem is a di#cult problem for the data stream domain. This is because the large volumes of data arriving in a stream renders most traditional algorithms too inefficient.

MAIDS: Mining Alarming Incidents from Data Streams (2008)

Y. Dora Cai, David Clutter, Greg Pape, Jiawei Han, Michael Welge, Loretta Auvil

Real-time surveillance systems, network and telecommunication systems, and other dynamic processes often generate tremendous (potentially infinite) volume of stream data. Effective analysis of such...

Mining Evolving Customer-Product Relationships in Multi-Dimensional Space (2008)

Xiaolei Li Jiawei, Xiaolei Li, Jiawei Han, Xiaoxin Yin, Dong Xin

Previous work on mining transactional database has focused primarily on mining frequent itemsets, association rules, and sequential patterns. However, interesting relationships between customers and...

Evaluation of Declarative N-Queens Recursion: A Deductive Database Approach (2008)

Jiawei Han, Ling Liu, Tong Lu

Can we evaluate a logic program declaratively? That is, can a logic program be evaluated correctly and efficiently, independent of query modes and rule/predicate ordering, finding a complete set of...

CISpan: Comprehensive incremental mining algorithms of closed sequential patterns for multi-versional software mining (2008)

Ding Yuan, Kyuhyung Lee, Hong Cheng, Gopal Krishna, Zhenmin Li, Xiao Ma, ...

Recently, frequent sequential pattern mining algorithms have been widely used in software engineering field to mine various source code or specification patterns. In practice, software evolves from...

2 (2007)

Hongjun Lu, Jiawei Han, Ling Feng

for stock movement prediction Among all the data mining problems, discovering association rules from large databases is probably the most significant contribution from the database community to the...

On the Complexity of Mining Quantitative Association Rules (2007)

Raymond Ng, Jiawei Han, Laks Lakshmanan

Abstract. The discovery of quantitative association rules in large databases is considered an interesting and important research problem. Recently, different aspects of the problem have been studied,...

Indexing in Spatial Databases (2007)

Beng Chin Ooi, Ron Sacks-davis, Jiawei Han

Spatial information processing has been a focus of research in the past decade. In spatial databases, data are associated with spatial coordinates and extents, and are retrieved based on spatial...

Discovering Geographic Knowledge in Data-Rich Environments (2007)

Report Of Specialist, Harvey J. Miller, Jiawei Han

this report provides the research statements submitted during the open call for participation. Section 3 provides a summary of the workshop presentations and discussion, including an overall...

Efficient Rule-Based Attribute-Oriented Induction for Data Mining (2007)

David W. Cheung, H. Y. Hwang, Ada W. Fu, Jiawei Han

. Data mining has become an important technique which has tremendous potential in many commercial and industrial applications. Attribute-oriented induction is a powerful mining technique and has been...

Mining Inter-Transaction Associations with Templates (2007)

Hongjun Jeffrey Xu, Ling Feng, Hongjun Lu, Jiawei Han

Multi-dimensional, inter-transaction association rules extend the traditional association rules to describe more general associations among items with multiple properties cross transactions....

Osmar R. Zaïane (2007)

Osmar R. Za��ane, Eli Hagen Jiawei, Jiawei Han

We have designed and implemented MultiMediaMiner, a system prototype to mine high-level multimedia information and knowledge from large multimedia repositories like the WWW. WordNet, a semantic...

Discovering Geographic Knowledge In Data Rich Environments (2007)

Report Of Specialist, Harvey J. Miller, Jiawei Han

this report provides the participant list and contact information. Section 3 provides the research statements submitted during the open call for participation. Section 4 provides a summary of the...

Index Nesting -- an Efficient Approach to Indexing in Object-Oriented Databases (2007)

Beng Chin, Beng Chin Ooi, Jiawei Han, Hongjun Lu, Kian Lee Tan

.<F3.733e+05> In object-oriented database systems where the concept of the superclass-subclass is supported, an instance of a subclass is also an instance of its superclass. Consequently, the...

Concise Papers __________________________________________________________________________________________ Mining Multiple-Level Association Rules in Large Databases (2007)

Jiawei Han, Yongjian Fu, Ieee Computer Society, Ieee Computer Society

AbstractÐA top-down progressive deepening method is developed for efficient mining of multiple-level association rules from large transaction databases based on the Apriori principle. A group of...

CMAR: Accurate and Efficient Classification Based on Multiple (2007)

Class-association Rules, Wenmin Li, Jiawei Han, Jian Pei

Previous studies propose that associative classification has high classification accuracy and strong flexibility at handling unstructured data. However, it still suffers from the huge set of mined...

and (2007)

Wei Lu, Jiawei Han, Beng Chin Ooi

Extraction of interesting and general knowledge from large spatial databases is an important task in the development of spatial data- and knowledge-base systems. In this paper, we investigate...

Chapter 3 Mining Frequent Patterns in Data Streams at Multiple Time (2007)

Chris Giannella, Jiawei Han, Jian Pei, Xifeng Yan, Philip S. Yu

Although frequent-pattern mining has been widely studied and used, it is challenging to extend it to data streams. Compared to mining from a static transaction data set, the streaming case has far...

ABSTRACT Mining Concept-Drifting Data Streams Using Ensemble Classifiers (2007)

Haixun Wang, Wei Fan, Philip S. Yu, Jiawei Han

Recently, mining data streams with concept drifts for actionable insights has become an important and challenging task for a wide range of applications including credit card fraud protection, target...

ABSTRACT Mining Concept-Drifting Data Streams using Ensemble Classifiers (2007)

Haixun Wang, Wei Fan, Philip S. Yu, Jiawei Han

Recently, mining data streams with concept drifts for actionable insights has become an important and challenging task for a wide range of applications including credit card fraud protection, target...

Multi-Dimensional Regression Analysis of Time-Series Data Streams* (2007)

Yixin Chen I, Guozhu Dong, Jiawei Han, I Benjamin, W. Wah, Jianyong Wang I

Real-time production systems and other dynamic environments often generate tremendous (potentially infinite) amount of stream data; the volume of data is too huge to be stored on disks or scanned...

z (2007)

Jiawei Han, Jian Pei, Guozhu Dong, Ke Wang

It is often too expensive to compute and materialize a complete high-dimensional data cube. Computing an iceberg cube, which contains only aggregates above certain thresholds, is an effective way to...

CMAR: Accurate and Efficient Classification Based on Multiple (2007)

Class-association Rules, Wenmin Li, Jiawei Han, Jian Pei

Previous studies propose that associative classification has high classification accuracy and strong flexibility at handling unstructured data. However, it still suffers from the huge set of mined...

Rule Measures: Support and Confidence Customer (2007)

Jiawei Han, Micheline Kamber, Buys Both

� Association rule mining: � Finding frequent patterns, associations, correlations, or causal structures among sets of items or objects in transaction databases, relational databases, and other...

y (2007)

Jiawei Han

Nonlinear recursion is one of the most challenging classes of logic programs for efficient evaluation in logic programming systems. We identify one popular class of nonlinear recursion, regular...

(Currently on leave from Concordia U.) (2007)

Raymond Ng, Jiawei Han, Simon Fraser U, Alex Pang

Currently, there is tremendous interest in providing ad-hoc mining capabilities in database management systems. As a first step towards this goal, in [15] we proposed an architecture for supporting...

y (2007)

Jiawei Han, Shojiro Nishio, Hiroyuki Kawano, Wei Wang

Data mining is the discovery of knowledge and useful information from the large amounts of data stored in databases. With the increasing popularity of object-oriented database systems in advanced...

Canada. (2007)

Raymond T. Ng, Jiawei Han

Canada. Spatial data mining is the discovery of interesting relationships and characteristics that may exist implicitly in spatial databases. In this paper, we explore whether clustering methods have...

2 (2007)

Hongjun Lu, Jiawei Han, Ling Feng

Most of the previous studies on mining association rules are on mining intra-transaction associations, i.e., the associations among items within the same transaction, where the notion of the...

2 (2007)

Raymond T. Ng, Jiawei Han

Abstract. Constrained clustering | nding clusters that satisfy user-specied constraints|is highly desirable in many applications. In this paper, we introduce the constrained clustering problem and...

BIDE: Efficient Mining of Frequent Closed Sequences (2007)

Jianyong Wang, Jiawei Han

Previous studies have presented convincing arguments that a frequent pattern mining algorithm should not mine all frequent patterns but only the closed ones because the latter leads to not only more...

ABSTRACT On Demand Classification of Data Streams (2007)

Charu C. Aggarwal, Jiawei Han, Jianyong Wang, Philip S. Yu

Current models of the classification problem do not effectively handle bursts of particular classes coming in at different times. In fact, the current model of the classification problem simply...

Y. Dora Cai § MAIDS: Mining Alarming Incidents from Data Streams ∗ (2007)

David Clutter, Greg Pape, Jiawei Han, Michael Welge

Many applications exist today that require the analysis of data streams. Data streams are dynamically changing, in high volume, potentially infinite, and require multi-dimensional analysis. These...

Trajectory Clustering: A Partition-and-Group Framework (2007)

Lee, Jae-Gil, Han, Jiawei, Whang, Kyu-Young

Existing trajectory clustering algorithms group similar trajectories as a whole, thus discovering common trajectories. Our key observation is that clustering trajectories as a whole could miss common...

Progressive and selective merge: computing top-k with ad-hoc ranking functions (2007)

Dong Xin, Jiawei Han

The family of threshold algorithm (i.e., TA) has been widely studied for efficiently computing top-k queries. TA uses a sort-merge framework that assumes data lists are pre-sorted, and the ranking...

Exploring the Power of Links in Data Mining (2007)

Jiawei Han, Xiaoxin Yin, Philip S. Yu, Jiawei Han, Xiaoxin Yin, Philip S. Yu, ...

Theme: “Knowledge is power, but knowledge is hidden in massive links”

Isometric projection (2007)

Deng Cai, Xiaofei He, Jiawei Han, Deng Cai, Xiaofei He, Jiawei Han

Recently the problem of dimensionality reduction has received a lot of interests in many fields of information processing, including data mining, information retrieval, and pattern recognition. We...

A general framework for mining concept-drifting data streams with skewed distributions (2007)

Jing Gao, Wei Fan, Jiawei Han, Philip S. Yu

In recent years, there have been some interesting studies on predictive modeling in data streams. However, most such studies assume relatively balanced and stable data streams but cannot handle well...

Truth discovery with multiple conflicting information providers on the web (2007)

Xiaoxin Yin, Jiawei Han

The world-wide web has become the most important information source for most of us. Unfortunately, there is no guarantee for the correctness of information on the web. Moreover, different web sites...

Discriminative frequent pattern analysis for effective classification (2007)

Hong Cheng, Xifeng Yan, Jiawei Han, Chih-wei Hsu

The application of frequent patterns in classification appeared in sporadic studies and achieved initial success in the classification of relational data, text documents and graphs. In this paper, we...

Mining colossal frequent patterns by core pattern fusion (2007)

Feida Zhu, Xifeng Yan, Jiawei Han, Philip S. Yu, Hong Cheng

Extensive research for frequent-pattern mining in the past decade has brought forth a number of pattern mining algorithms that are both effective and efficient. However, the existing frequent-pattern...

A general framework for mining concept-drifting data streams with skewed distributions (2007)

Jing Gao, Wei Fan, Jiawei Han, Philip S. Yu

In recent years, there have been some interesting studies on predictive modeling in data streams. However, most such studies assume relatively balanced and stable data streams but cannot handle well...

Cost-conscious cleaning of massive rfid data sets (2007)

Hector Gonzalez, Jiawei Han, Xuehua Shen

Efficient and accurate data cleaning is an essential task for the successful deployment of RFID systems. Although important advances have been made in tag detection rates, it is still common to see a...

Trajectory Clustering: A Partition-and-Group Framework (2007)

Jae-gil Lee, Jiawei Han

Existing trajectory clustering algorithms group similar trajectories as a whole, thus discovering common trajectories. Our key observation is that clustering trajectories as a whole could miss common...

Progressive and selective merge: computing top-k with ad-hoc ranking functions (2007)

Dong Xin, Jiawei Han

The family of threshold algorithm (i.e., TA) has been widely studied for efficiently computing top-k queries. TA uses a sort-merge framework that assumes data lists are pre-sorted, and the ranking...

Roam: Rule- and motif-based anomaly detection in massive moving object data sets (2007)

Xiaolei Li, Jiawei Han, Sangkyum Kim, Hector Gonzalez

With recent advances in sensory and mobile computing technology, enormous amounts of data about moving objects are being collected. One important application with such data is automated...

Adaptive fastest path computation on a road network: A traffic mining approach (2007)

Hector Gonzalez, Jiawei Han, Xiaolei Li, Margaret Myslinska, John Paul Sondag

Efficient fastest path computation in the presence of varying speed conditions on a large scale road network is an essential problem in modern navigation systems. Factors affecting road speed, such...

Mining approximate top-k subspace anomalies in multi-dimensional time-series data (2007)

Xiaolei Li, Jiawei Han

Market analysis is a representative data analysis process with many applications. In such an analysis, critical numerical measures, such as profit and sales, fluctuate over time and form time-series...

Cost-conscious cleaning of massive rfid data sets (2007)

Hector Gonzalez, Jiawei Han, Xuehua Shen

Efficient and accurate data cleaning is an essential task for the successful deployment of RFID systems. Although important advances have been made in tag detection rates, it is still common to see a...

H.: Locality sensitive discriminant analysis (2007)

Deng Cai, Jiawei Han

Linear Discriminant Analysis (LDA) is a popular data-analytic tool for studying the class relationship between data points. A major disadvantage of LDA is that it fails to discover the local...

Regularized locality preserving indexing via spectral regression (2007)

Deng Cai, Xiaofei He, Wei Vivian Zhang, Jiawei Han

We consider the problem of document indexing and representation. Recently, Locality Preserving Indexing (LPI) was proposed for learning a compact document subspace. Different from Latent Semantic...

Adaptive fastest path computation on a road network: A traffic mining approach (2007)

Hector Gonzalez, Jiawei Han, Xiaolei Li, Margaret Myslinska, John Paul Sondag

Efficient fastest path computation in the presence of varying speed conditions on a large scale road network is an essential problem in modern navigation systems. Factors affecting road speed, such...

LinkClus: Efficient clustering via heterogeneous semantic links (2006)

Xiaoxin Yin, Jiawei Han

Data objects in a relational database are cross-linked with each other via multi-typed links. Links contain rich semantic information that may indicate important relationships among objects. Most...

Statistical debugging: A hypothesis testing-based approach (2006)

Chao Liu, Long Fei, Xifeng Yan, Jiawei Han, Senior Member, Samuel P. Midkiff

Abstract—Manual debugging is tedious, as well as costly. The high cost has motivated the development of fault localization techniques, which help developers search for fault locations. In this...

GPLAG: Detection of Software Plagiarism by Program Dependence Graph Analysis (2006)

Chao Liu, Chen Chen, Jiawei Han, Philip S. Yu

Along with the blossom of open source projects comes the convenience for software plagiarism. A company, if less self-disciplined, may be tempted to plagiarize some open source projects for its own...

Ranking Objects by Exploiting Relationships: Computing Top-K over Aggregation. SIGMOD (2006)

Kaushik Chakrabarti, Venkatesh Ganti, Jiawei Han, Dong Xin

In many document collections, documents are related to objects such as document authors, products described in the document, or persons referred to in the document. In many applications, the goal is...

Discovering interesting patterns through user’s interactive feedback (2006)

Dong Xin, Xuehua Shen, Qiaozhu Mei, Jiawei Han

In this paper, we study the problem of discovering interesting patterns through user’s interactive feedback. We assume a set of candidate patterns (i.e., frequent patterns) has already been mined....

Abstract On compressing frequent patterns q (2006)

Dong Xin, Jiawei Han, Xifeng Yan, Hong Cheng

A major challenge in frequent-pattern mining is the sheer size of its mining results. To compress the frequent patterns, we propose to cluster frequent patterns with a tightness measure d (called...

Discovering Interesting Patterns through User's Interactive Feedback (2006)

Dong Xin, Xuehua Shen, Qiaozhu Mei, Jiawei Han

In this paper, we study the problem of discovering interesting patterns through user's interactive feedback. We assume a set of candidate patterns (i.e., frequent patterns) has already been...

Generating semantic annotations for frequent patterns with context analysis (2006)

Qiaozhu Mei, Dong Xin, Hong Cheng, Jiawei Han, Chengxiang Zhai

As a fundamental data mining task, frequent pattern mining has widespread applications in many different domains. Research in frequent pattern mining has so far mostly focused on developing efficient...

Ranking outliers using symmetric neighborhood relationship (2006)

Wen Jin, Jiawei Han, Wei Wang

Abstract. Mining outliers in database is to find exceptional objects that deviate from the rest of the data set. Besides classical outlier analysis algorithms, recent studies have focused on mining...

FlowCube: Constructing RFID FlowCubes for Multi-Dimensional Analysis of Commodity Flows (2006)

Hector Gonzalez, Jiawei Han, Xiaolei Li

With the advent of RFID (Radio Frequency Identification) technology, manufacturers, distributors, and retailers will be able to track the movement of individual objects throughout the supply chain....

GPLAG: Detection of Software Plagiarism by Program Dependence Graph Analysis (2006)

Chao Liu, Chen Chen, Jiawei Han, Philip S. Yu

Along with the blossom of open source projects comes the convenience for software plagiarism. A company, if less self-disciplined, may be tempted to plagiarize some open source projects for its own...

C-Cubing: Efficient Computation of Closed Cubes by Aggregation-Based Checking (2006)

Dong Xin, Zheng Shao, Jiawei Han, Hongyan Liu

It is well recognized that data cubing often produces huge outputs. Two popular efforts devoted to this problem are (1) iceberg cube, where only significant cells are kept, and (2) closed cube, where...

AC-Close: efficiently mining approximate closed itemsets by core pattern recovery (2006)

Hong Cheng, Philip S. Yu, Jiawei Han

Recent studies have proposed methods to discover approximate frequent itemsets in the presence of random noise. By relaxing the rigid requirement of exact frequent pattern mining, some interesting...

Answering top-k queries with multi-dimensional selections: The ranking cube approach (2006)

Dong Xin, Jiawei Han, Hong Cheng, Xiaolei Li

Observed in many real applications, a top-k query often consists of two components to reflect a user’s preference: a selection condition and a ranking function. A user may not only propose ad hoc...

Answering top-k queries with multi-dimensional selections: The ranking cube approach (2006)

Dong Xin, Jiawei Han, Hong Cheng, Xiaolei Li

Observed in many real applications, a top-k query often consists of two components to reflect a user’s preference: a selection condition and a ranking function. A user may not only propose ad hoc...

Generating semantic annotations for frequent patterns with context analysis (2006)

Qiaozhu Mei, Dong Xin, Hong Cheng, Jiawei Han, Chengxiang Zhai

As a fundamental data mining task, frequent pattern mining has widespread applications in many different domains. Research in frequent pattern mining has so far mostly focused on developing efficient...

FlowCube: Constructing RFID FlowCubes for Multi-Dimensional Analysis of Commodity Flows (2006)

Hector Gonzalez, Jiawei Han, Xiaolei Li

will be able to track the movement of individual objects throughout the supply chain. The volume of data generated by a typical RFID application will be enormous as each item will generate a complete...

Warehousing and analyzing massive RFID data sets (2006)

Hector Gonzalez, Jiawei Han, Xiaolei Li, Diego Klabjan

Radio Frequency Identification (RFID) applications are set to play an essential role in object tracking and supply chain management systems. In the near future, it is expected that every major...

LinkClus: Efficient clustering via heterogeneous semantic links (2006)

Xiaoxin Yin, Jiawei Han

Data objects in a relational database are cross-linked with each other via multi-typed links. Links contain rich semantic information that may indicate important relationships among objects. Most...

LinkClus: Efficient clustering via heterogeneous semantic links (2006)

Xiaoxin Yin, Jiawei Han

Data objects in a relational database are cross-linked with each other via multi-typed links. Links contain rich semantic information that may indicate important relationships among objects. Most...

Error-adaptive and time-aware maintenance of frequency counts over data streams (2006)

Hongyan Liu, Ying Lu, Jiawei Han, Jun He

Abstract. Maintaining frequency counts for items over data stream has a wide range of applications such as web advertisement fraud detection. Study of this problem has attracted great attention from...

Semi-Supervised Regression using Spectral Techniques ∗ (2006)

Deng Cai, Xiaofei He, Jiawei Han, Deng Cai, Xiaofei He, Jiawei Han

Graph-based approaches for semi-supervised learning have received increasing amount of interest in recent years. Despite their good performance, many pure graph based algorithms do not have explicit...

Generating semantic annotations for frequent patterns with context analysis (2006)

Qiaozhu Mei, Dong Xin, Hong Cheng, Jiawei Han, Chengxiang Zhai

As a fundamental data mining task, frequent pattern mining has widespread applications in many different domains. Research in frequent pattern mining has so far mostly focused on developing efficient...

Tensor space model for document analysis (2006)

Deng Cai, Deng Cai, Xiaofei He, Xiaofei He, Jiawei Han, Jiawei Han

Vector Space Model (VSM) has been at the core of information retrieval for the past decades. VSM considers the documents as vectors in high dimensional space. In such a vector space, tech-niques like...

Ranking Objects by Exploiting Relationships: Computing Top-K over Aggregation. SIGMOD (2006)

Kaushik Chakrabarti, Venkatesh Ganti, Jiawei Han, Dong Xin

In many document collections, documents are related to objects such as document authors, products described in the document, or persons referred to in the document. In many applications, the goal is...

Thesis: Scalable Mining Across Multiple Database Relations (2006)

Xiaoxin Yin, Advisor Prof, Jiawei Han, Advisor Prof, Xiaoyan Zhu

• Five years of intensive research experiences on data mining in the most prestigious data mining research group in North America. • Generated pioneer work in applying data mining and machine...

Answering top-k queries with multi-dimensional selections: The ranking cube approach (2006)

Dong Xin, Jiawei Han, Hong Cheng, Xiaolei Li

Observed in many real applications, a top-k query often consists of two components to reflect a user’s preference: a selection condition and a ranking function. A user may not only propose ad hoc...

Statistical debugging: A hypothesis testing-based approach (2006)

Chao Liu, Long Fei, Xifeng Yan, Jiawei Han, Senior Member, Samuel P. Midkiff

Abstract—Manual debugging is tedious, as well as costly. The high cost has motivated the development of fault localization techniques, which help developers search for fault locations. In this...

Integrative Array Analyzer: a software package for analysis of cross-platform and cross-species microarray data (2006)

Pan, Fei, Kamath, Kiran, Zhang, Kangyu, Pulapura, Sudip, Achar, Avinash, Nunez-Iglesias, Juan, ...

Summary: The rapid accumulation of microarray data translates into an urgent need for tools to perform integrative microarray analysis. Integrative Array Analyzer is a comprehensive analysis and...

Mining hidden community in heterogeneous social networks (2005)

Deng Cai, Zheng Shao, Xiaofei He, Xifeng Yan, Jiawei Han

Social network analysis has attracted much attention in recent years. Community mining is one of the major directions in social network analysis. Most of the existing methods on community mining...

Mining compressed frequent-pattern sets (2005)

Dong Xin, Jiawei Han, Xifeng Yan, Hong Cheng

A major challenge in frequent-pattern mining is the sheer size of its mining results. In many cases, a high min sup threshold may discover only commonsense patterns but a low one may generate an...

Mining hidden community in heterogeneous social networks (2005)

Deng Cai, Zheng Shao, Xiaofei He, Xifeng Yan, Jiawei Han, Deng Cai, ...

Social network analysis has attracted much attention in recent years. Community mining is one of the major directions in social network analysis. Most of the existing methods on community mining...

TFP: An Efficient Algorithm for Mining Top-K Frequent Closed Itemsets (2005)

Jianyong Wang, Jiawei Han, Senior Member, Ying Lu, Petre Tzvetkov

Abstract—Frequent itemset mining has been studied extensively in literature. Most previous studies require the specification of a min_support threshold and aim at mining a complete set of frequent...

Substructure similarity search in graph databases (2005)

Xifeng Yan, Philip S. Yu, Jiawei Han

Advanced database systems face a great challenge raised by the emergence of massive, complex structural data in bioinformatics, chem-informatics, and many other applications. The most fundamental...

Efficient Classification from Multiple Heterogeneous Databases (2005)

Xiaoxin Yin, Jiawei Han

With the fast expansion of computer networks, it is inevitable to study data mining on heterogeneous databases. In this paper we propose MDBM, an accurate and efficient approach for classification on...

Mining Compressed Frequent-Pattern Sets (2005)

Dong Xin Jiawei, Dong Xin, Jiawei Han, Xifeng Yan, Hong Cheng

A major challenge in frequent-pattern mining is the sheer size of its mining results. In many cases, a high min sup threshold may discover only commonsense patterns but a low one may generate an...

Searching for Related Objects in Relational Databases (2005)

Xiaoxin Yin, Jiawei Han, Jiong Yang

To discover knowledge or retrieve information from a relational database, a user often needs to find objects related to certain source objects. There are two main challenges in building an e#ective...

Parallel mining of closed sequential patterns (2005)

Shengnan Cong, Jiawei Han, David Padua

Discovery of sequential patterns is an essential data mining task with broad applications. Among several variations of sequential patterns, closed sequential pattern is the most useful one since it...

Summarizing itemset patterns: a profile-based approach (2005)

Xifeng Yan, Hong Cheng, Jiawei Han, Dong Xin

Frequent-pattern mining has been studied extensively on scalable methods for mining various kinds of patterns including itemsets, sequences, and graphs. However, the bottleneck of frequent-pattern...

A samplingbased framework for parallel data mining (2005)

Shengnan Cong, Jiawei Han, Jay Hoeflinger, David Padua

The goal of data mining algorithm is to discover useful information embedded in large databases. Frequent itemset mining and sequential pattern mining are two important data mining problems with...

Constraint-based sequential pattern mining: the pattern-growth methods (2005)

Jian Pei, Jiawei Han, Wei Wang

Constraints are essential for many sequential pattern mining applications. However, there is no systematic study on constraint-based sequential pattern mining. In this paper, we investigate this...

Cross-relational clustering with user’s guidance (2005)

Xiaoxin Yin, Jiawei Han

Clustering is an essential data mining task with numerous applications. However, data in most real-life applications are high-dimensional in nature, and the related information often spreads across...

Community mining from multi-relational networks (2005)

Deng Cai, Zheng Shao, Xiaofei He, Xifeng Yan, Jiawei Han

Abstract. Social network analysis has attracted much attention in recent years. Community mining is one of the major directions in social network analysis. Most of the existing methods on community...

Mining coherent dense subgraphs across massive biological networks for functional discovery (2005)

Hu, Haiyan, Yan, Xifeng, Huang, Yu, Han, Jiawei, Zhou, Xianghong Jasmine

Motivation: The rapid accumulation of biological network data translates into an urgent need for computational methods for graph pattern mining. One important problem is to identify recurrent...

Pebl: web page classification without negative examples (2004)

Hwanjo Yu, Jiawei Han

Abstract—Web page classification is one of the essential techniques for Web mining because classifying Web pages of an interesting class is often the first step of mining the Web. However,...

High-dimensional OLAP: A minimal cubing approach (2004)

Xiaolei Li, Jiawei Han, Hector Gonzalez

Data cube has been playing an essential role in fast OLAP (online analytical processing) in many multi-dimensional data warehouses. However, there exist data sets in applications like bioinformatics,...

Dissertation: Mining, Indexing, and Similarity Search in Large Graph Datasets (2004)

Xifeng Yan, Advisor Prof, Jiawei Han

Data mining, data management and machine learning, with emphasis on modeling, managing, and mining large-scale graphs and networks in bioinformatics, social networks, the Web, and computer systems. I...

Crossmine: efficient classification across multiple database relations (2004)

Xiaoxin Yin, Jiawei Han, Jiong Yang, Philip S. Yu, Resch Ctr

Most of today’s structured data is stored in relational databases. Such a database consists of multiple relations which are linked together conceptually via entity-relationship links in the design...

W.: Mining Constrained Gradients in Large Databases (2004)

Guozhu Dong, Jiawei Han, Senior Member, Senior Member, Jian Pei, ...

Abstract—Many data analysis tasks can be viewed as search or mining in a multidimensional space (MDS). In such MDSs, dimensions capture potentially important factors for given applications, and...

Crossmine: efficient classification across multiple database relations (2004)

Xiaoxin Yin, Jiawei Han, Jiong Yang, Philip S. Yu

Abstract. Most of today’s structured data is stored in relational databases. Such a database consists of multiple relations that are linked together conceptually via entity-relationship links in...

Mining sequential patterns by pattern-growth: The PrefixSpan approach (2004)

Jian Pei, Ieee Computer Society, Jiawei Han, Senior Member, Behzad Mortazavi-asl, Jianyong Wang, ...

Abstract—Sequential pattern mining is an important data mining problem with broad applications. However, it is also a difficult problem since the mining may have to generate or examine a...

Crossmine: efficient classification across multiple database relations (2004)

Xiaoxin Yin, Jiawei Han, Senior Member, Jiong Yang, Philip S. Yu

Abstract—Relational databases are the most popular repository for structured data, and is thus one of the richest sources of knowledge in the world. In a relational database, multiple relations are...

Incspan: incremental mining of sequential patterns in large database (2004)

Hong Cheng, Jiawei Han

Many real life sequence databases grow incrementally. It is undesirable to mine sequential patterns from scratch each time when a small set of sequences grow, or when some new sequences are added...

A framework for projected clustering of high dimensional data streams (2004)

Charu C. Aggarwal, Jiawei Han, Jianyong Wang, Philip S. Yu, T. J. Watson, T. J. Watson, ...

The data stream problem has been studied extensively in recent years, because of the great ease in collection of stream data. The nature of stream data makes it essential to use algorithms which...

Mm-cubing: Computing iceberg cubes by factorizing the lattice space (2004)

Zheng Shao, Jiawei Han, Dong Xin

The data cube and iceberg cube computation problem has been studied by many researchers. There are three major approaches developed in this direction: (1) top-down computation, represented by...

Mining sequential patterns by pattern-growth: The PrefixSpan approach (2004)

Jian Pei, Jiawei Han, Senior Member, Behzad Mortazavi-asl, Jianyong Wang, Helen Pinto, ...

Abstract—Sequential pattern mining is an important data mining problem with broad applications. However, it is also a difficult problem since the mining may have to generate or examine a...

Mining Scale-free Networks Using Geodesic Clustering (2004)

Andrew Y. Wu, Michael Garland, Jiawei Han

Many real-world graphs have been shown to be scale-free--- vertex degrees follow power law distributions, vertices tend to cluster, and the average length of all shortest paths is small. We present a...

High-Dimensional OLAP: A Minimal Cubing Approach (2004)

Xiaolei Li, Jiawei Han, Hector Gonzalez

Data cube has been playing an essential role in fast OLAP (online analytical processing) in many multi-dimensional data warehouses.

Discovering Complex Matchings across Web Query Interfaces: A Correlation Mining Approach (2004)

Bin He, Jiawei Han

To enable information integration, schema matching is a critical step for discovering semantic correspondences of attributes across heterogeneous sources. While complex matchings are common, because...

W.: Mining Constrained Gradients in Large Databases (2004)

Guozhu Dong, Jiawei Han, Joyce Lam, Jian Pei, Ke Wang, Wei Zou

Many data analysis tasks can be viewed as search or mining in a multidimensional space (MDS). In such MDSs, dimensions capture potentially important factors for given applications, and cells...

Incspan: incremental mining of sequential patterns in large database (2004)

Hong Cheng, Xifeng Yan, Jiawei Han

Many real life sequence databases grow incrementally. It is undesirable to mine sequential patterns from scratch each time when a small set of sequences grow, or when some new sequences are added...

Clustering Moving Objects (2004)

Yifan Li, Jiawei Han, Jiong Yang

A ) 1" 4> . & ' .& "CBEDF0 G#.HI6" .>, <JK *2 +L8!1M&" ## ,6& 4 $! . & N3 ." ,%O#6 & G> .P .<M' . # KNP3 .> #8,QG6" 1...

CCMine: Efficient Mining of Confidence-Closed Correlated Patterns (2004)

Won-young Kim, Young-Koo Lee, Jiawei Han

Correlated pattern mining has become increasingly important recently as an alternative or an augmentation of association rule mining. Though correlated pattern mining...

A Framework for Projected Clustering of High Dimensional Data Streams (2004)

Charu C. Aggarwal, Jiawei Han, Jianyong Wang, Philip S. Yu, T. J. Watson, T. J. Watson, ...

The data stream problem has been studied extensively in recent years, because of the great ease in collection of stream data. The nature of stream data makes it essential to use algorithms which...

Mining thick skylines over large databases (2004)

Wen Jin, Jiawei Han, Martin Ester

Abstract. People recently are interested in a new operator, called skyline [3], which returns the objects that are not dominated by any other objects with regard to certain measures in a...

Clospan: Mining closed sequential patterns in large datasets (2003)

Xifeng Yan, Jiawei Han, Ramin Afshar

Previous sequential pattern mining algorithms mine the full set of frequent subsequences satisfying a rain_sup threshold in a sequence database. However, since a frequent long sequence contains a...

A framework for clustering evolving data streams (2003)

Charu C. Aggarwal, T. J. Watson, Resch Ctr, Jiawei Han, Jianyong Wang, Philip S. Yu

The clustering problem is a difficult problem for the data stream domain. This is because the large volumes of data arriving in a stream renders most traditional algorithms too inefficient. In recent...

Star-cubing: Computing iceberg cubes by top-down and bottom-up integration (2003)

Dong Xin, Jiawei Han, Xiaolei Li, Benjamin W. Wah

Data cube computation is one of the most essential but expensive operations in data warehousing. Previous studies have developed two major approaches, top-down vs. bottomup. The former, represented...

Using data mining for discovering patterns in autonomic storage systems (2003)

Zhenmin Li, Sudarshan M. Srinivasan, Zhifeng Chen, Yuanyuan Zhou, Peter Tzvetkov, Xifeng Yan, ...

In order to be self-tuning, self-managing, self-healing and selfprotecting, a storage system needs to be able to automatically characterize access patterns. This paper proposes an approach that uses...

Object matching for information integration: A profiler-based approach (2003)

Anhai Doan, Ying Lu, Yoonkyong Lee, Jiawei Han

Object matching is a fundamental problem that arises in numerous information integration scenarios. Virtually all existing solutions to this problem have assumed that the objects to be matched share...

Pushing Support Constraints Into Association Rules Mining (2003)

Ke Wang, Yu He, Jiawei Han

Interesting patterns often occur at varied levels of support. The classic association mining based on a uniform minimum support, such as Apriori, either misses interesting patterns of low support or...

CoMine: Efficient Mining of Correlated Patterns (2003)

Young-koo Lee, Won-young Kim, Y. Dora Cai, Jiawei Han

Association rule mining often generates a huge number of rules, but a majority of them either are redundant or do not reflect the true correlation relationship among data objects. In this paper, we...

Using Data Mining for Discovering Patterns in Autonomic Storage Systems (2003)

Zhenmin Li, Sudarshan M. Srinivasan, Zhifeng Chen, Yuanyuan Zhou, Peter Tzvetkov, Xifeng Yan, ...

In order to be self-tuning, self-managing, self-healing and selfprotecting, a storage system needs to be able to automatically characterize access patterns. This paper proposes an approach that uses...

TSP: Mining Top-K Closed Sequential Patterns (2003)

Petre Tzvetkov Xifeng, Xifeng Yan, Jiawei Han

Sequential pattern mining has been studied extensively in data mining community. Most previous studies require the specification of a minimum support threshold to perform the mining. However, it is...

Using Data Mining for Discovering Patterns in Autonomic Storage Systems (2003)

Zhenmin Li, Sudarshan M. Srinivasan, Zhifeng Chen, Yuanyuan Zhou, Peter Tzvetkov, Xifeng Yan, ...

In order to be self-tuning, self-managing, self-healing and selfprotecting, a storage system needs to be able to automatically characterize access patterns. This paper proposes an approach that uses...

Text Classification from Positive and Unlabeled Documents (2003)

Hwanjo Yu, ChengXiang Zhai, Jiawei Han

Most existing studies of text classification assume that the training data are completely labeled. In reality, however, many information retrieval problems can be more accurately described as...

Online Mining of Changes from Data Streams: (2003)

Research Problems And, Guozhu Dong, Jiawei Han, Jian Pei, Haixun Wang, ...

As data streams are gaining prominence in a growing number of emerging applications, advanced analysis and mining of data streams is becoming increasingly important. While there are some recent...

Star-cubing: Computing iceberg cubes by top-down and bottom-up integration (2003)

Dong Xin, Jiawei Han, Xiaolei Li, Benjamin W. Wah

Data cube computation is one of the most essential but expensive operations in data warehousing. Previous studies have developed two major approaches, top-down vs. bottomup. The former, represented...

A framework for clustering evolving data streams (2003)

Charu C. Aggarwal, T. J. Watson, Resch Ctr, Jiawei Han, Jianyong Wang, Philip S. Yu

The clustering problem is a difficult problem for the data stream domain. This is because the large volumes of data arriving in a stream renders most traditional algorithms too inefficient. In recent...

Star-cubing: Computing iceberg cubes by top-down and bottom-up integration (2003)

Dong Xin, Student Member, Jiawei Han, Senior Member, Xiaolei Li, Zheng Shao, ...

Abstract—Data cube computation is one of the most essential but expensive operations in data warehousing. Previous studies have developed two major approaches, top-down versus bottom-up. The...

Cancer classification using gene expression data (2003)

Ying Lu, Jiawei Han

The classification of different tumor types is of great importance in cancer diagnosis and drug discovery. However, most previous cancer classification studies are clinical-based and have limited...

Mining Constrained Gradients in Large Databases (2003)

Guozhu Dong, Jiawei Han, Joyce Lam, Jian Pei, Ke Wang, Wei Zou

Many data analysis tasks can be viewed as search or mining in a multidimensional space (MDS).

Profile-based object matching for information integration (2003)

Anhai Doan, Ying Lu, Yoonkyong Lee, Jiawei Han

matching methods rely on similarities among shared attributes. Profile-Based Object Matching builds on this approach but also correlates disjoint attributes to improve matching accuracy.

Mining long sequential patterns in a noisy environment (2002)

Jiong Yang, T. J. Watson, Wei Wang, Philip S. Yu, Jiawei Han

Pattern discovery in long sequences is of great importance in many applications including computational biology study, consumer behavior analysis, system performance analysis, etc. In a noisy...

Profit mining: from patterns to actions (2002)

Ke Wang, Senqiang Zhou, Jiawei Han

Abstract. A major obstacle in data mining applications is the gap between the statistic-based pattern extraction and the value-based decision making. We present a profit mining approach to reduce...

Data Mining for Web Intelligence (2002)

Jiawei Han

Data mining holds the key to uncovering and cataloging the authoritative links, traversal patterns, and semantic structures that will bring intelligence and direction to our Web interactions. Through...

Online analytical processing stream data: Is it feasible (2002)

Yixin Chen, Guozhu Dong, Jiawei Han, Jian Pep

Real-time surveillance systems and other dynamic environ-ments often generate tremendous (potentially infinite) vol-ume of stream data: the volume is too huge to be scanned multiple times. However,...

Multi-dimensional regression analysis of time-series data streams (2002)

Yixin Chen, Guozhu Dong, Jiawei Han, Benjamin W. Wah, Jianyong Wang

Real-time production systems and other dynamic environments often generate tremendous (potentially innite) amount of stream data; the volume of data is too huge to be stored on disks or scanned...

Quotient cube: How to summarize the semantics of a data cube (2002)

Jian Pet, Simon Fraser U, Jiawei Han

Partitioning a data cube into sets of cells with &quot;similar behavior &quot; often better exposes the semantics in the cube. E.g., if we find that average boots sales in the West 10th store...

How Can Data Mining Help Bio-Data Analysis? (Extended Abstract) (2002)

Jiawei Han

Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign hanj@cs.uiuc.edu ABSTRACT Recent progress in data mining research has led to the development of numerous e#cient...

Spatial clustering in the presence of obstacles (2001)

Jean Hou, Jiawei Han

Clustering in spatial data mining is to group similar objects based on their distance, connectivity, or their relative density in space. In the real world, there exist many physical obstacles such as...

PrefixSpan: Mining sequential patterns efficiently by prefix-projected pattern growth (2001)

Jian Pei, Jiawei Han, Behzad Mortazavi-asl, Helen Pinto, Qiming Chen, Umeshwar Dayal, ...

Sequential pattern mining is an important data mining problem with broad applications. It is challenging since one may need to examine a combinatorially explosive number of possible subsequence...

RecTree: An Efficient Collaborative Filtering Method (2001)

Sonny Han, Seng Chee, Jiawei Han, Ke Wang

Abstract. Many people rely on the recommendations of trusted friends to find restaurants or movies, which match their tastes. But, what if your friends have not sampled the item of interest?...

Mining multi-dimensional constrained gradients in data cubes (2001)

Guozhu Dong, Jiawei Han, Joyce Lam, Jian Pei, Ke Wang

Constrained gradient analysis (similar to the “cubegrade ” problem posed by Imielinski, et al. [9]) is to extract pairs of similar cell characteristics associated with big changes in measure in a...

PrefixSpan: Mining sequential patterns efficiently by prefix-projected pattern growth (2001)

Jian Pei, Jiawei Han, Behzad Mortazavi-asl, Helen Pinto, Qiming Chen, Umeshwar Dayal, ...

Sequential pattern mining is an important data mining problem with broad applications. It is challenging since one may need to examine a combinatorially explosive number of possible subsequence...

Mining top-n local outliers in large databases (2001)

Wen Jin, Jiawei Han, Canada Va S

Outlier detection is an important task in data mining with numerous applications, including credit card fraud detection, video surveillance, etc. A recent work on outlier detection has introduced a...

Mining multi-dimensional constrained gradients in data cubes (2001)

Guozhu Dong, Jiawei Han, Joyce Lam, Jian Pei, Ke Wang

Constrained gradient analysis (similar to the “cubegrade ” problem posed by Imielinski, et al. [9]) is to extract pairs of similar cell characteristics associated with big changes in measure in a...

Mining frequent itemsets with convertible constraints (2001)

Jian Pei, Jiawei Han

Recent work has highlighted the importance of the constraint-based mining paradigm in the context of frequent itemsets, associations, correlations, sequential patterns, and many other interesting...

Mining multi-dimensional constrained gradients in data cubes (2001)

Guozhu Dong, Jiawei Han, Joyce Lam, Jian Pei, Ke Wang

Constrained gradient analysis (similar to the “cubegrade ” problem posed by Imielinski, et al. [9]) is to extract pairs of similar cell characteristics associated with big changes in measure in a...

RecTree: An Efficient Collaborative Filtering Method (2001)

Sonny Han, Seng Chee, Jiawei Han, Ke Wang

Abstract. Many people rely on the recommendations of trusted friends to find restaurants or movies, which match their tastes. But, what if your friends have not sampled the item of interest?...

M: Cluster Analysis (2001)

Jiawei Han, Micheline Kamber, Morgan Kaufmann Publishers

Clustering has been studied extensively for more than 40 years and across many disciplines due to its broad applications. Most books on pattern classification and machine learning contain chapters on...

Mining multi-dimensional constrained gradients in data cubes (2001)

Guozhu Dong, Jiawei Han, Joyce Lam, Jian Pei, Ke Wang

1 Introduction In recent years, there have been growing interests in multi-dimensional analysis of relational databases, transactional

Object-Based Selective Materialization for Efficient Implementation of Spatial Data Cubes (2000)

Nebojsa Stefanovic, Jiawei Han, Ieee Computer Society, Ieee Computer Society

AbstractÐWith a huge amount of data stored in spatial databases and the introduction of spatial components to many relational or object-relational databases, it is important to study the methods for...

CLOSET: An efficient algorithm for mining frequent closed itemsets (2000)

Jian Pei, Jiawei Han, Runying Mao

Association mining may often derive an undesirably large set of frequent itemsets and association rules. Recent studies have proposed an interesting alternative: mining frequent closed itemsets and...

Mining frequent patterns without candidate generation (2000)

Jiawei Han, Runying Mao

Abstract. Mining frequent patterns in transaction databases, time-series databases, and many other kinds of databases has been studied popularly in data mining research. Most of the previous studies...

Mining Recurrent Items in Multimedia with Progressive Resolution Refinement (2000)

Osmar R. Zaïane, Jiawei Han, Hua Zhu

Despite the overwhelming amounts of multimedia data recently generated and the significance of such data, very few people have systematically investigated multimedia data mining. With our previous...

Mining Frequent Itemsets Using Support Constraints (2000)

Ke Wang, Yu He, Jiawei Han

Interesting patterns often occur at varied levels of support. The classic association mining based on a uniform minimum support, such as Apriori, either misses interesting patterns of low support or...

Mining Access Patterns Efficiently from Web Logs (2000)

Jian Pei, Jiawei Han, Behzad Mortazavi-asl, Hua Zhu

With the explosive growth of data available on the World Wide Web, discovery and analysis of useful information from the World Wide Web becomes a practical necessity.Web access pattern, which is the...

Mining Recurrent Items in Multimedia with Progressive Resolution Refinement (2000)

Osmar Zaïane, Jiawei Han, Hua Zhu

Despite the overwhelming amounts of multimedia data recently generated and the significance of such data, very few people have systematically investigated multimedia data mining. With our previous...

Efficient mining of partial periodic patterns in time series database (1999)

Jiawei Han

Partial periodicity search, i.e., search for partial periodic patterns in time-series databases, is an interesting data mining problem. Previous studies on periodicity search mainly consider finding...

Efficient mining of partial periodic patterns in time series database (1999)

Jiawei Han

Partial periodicity search, i.e., search for partial periodic patterns in time-series databases, is an interesting data mining problem. Previous studies on periodicity search mainly consider finding...

Plan Mining by Divide-and-Conquer (1999)

Jiawei Han, Qiang Yang, Edward Kim

Plans or sequences of actions are an important form of data. With the proliferation of database technology, plan databases (or planbases) are increasingly common. Efficient discovery of important...

Breaking the barrier of transactions: Mining inter-transaction association rules (1999)

Honkjun Lu, Jiawei Han, Ling Feng

Most of the previous studies on mining association rules are on mining intra-transaction associations, i.e., the associations among items within the same transaction, where the notion of the...

Word taxonomy for online visual asset management and mining. Application of Natural Language to Information Systems (1999)

Osmar R. Zaïane, Eli Hagen, Jiawei Han

We have designed and implemented MultiMediaMiner, a system prototype to mine high-level multimedia information and knowledge from large multimedia repositories like the WWW. WordNet, a semantic...

Constraint-Based Multidimensional Data Mining (1999)

Jiawei Han, Simon Fraser, Laks V. S, Raymond T

Integrating both constraint-based and multidimensional mining into one framework provides an interactive, exploratory environment for effective and efficient data analysis and mining. Although there...

Word taxonomy for online visual asset management and mining. Application of Natural Language to Information Systems (1999)

Osmar R. Zaïane, Eli Hagen, Jiawei Han

We have designed and implemented MultiMediaMiner, a system prototype to mine high-level multimedia information and knowledge from large multimedia repositories like the WWW. WordNet, a semantic...

Efficient mining of partial periodic patterns in time series database (1999)

Jiawei Han, Guozhu Dong, Yiwen Yin

Partial periodicity search, i.e., search for partial periodic patterns in time-series databases, is an interesting data mining problem. Previous studies on periodicity search mainly consider finding...

Mining Frequent Patterns without Candidate Generation (1999)

Jiawei Han, Jian Pei, Yiwen Yin

Mining frequent patterns in transaction databases, timeseries databases, and many other kinds of databases has been studied popularly in data mining research. Most of the previous studies adopt an...

Join Index Hierarchy: An Indexing Structure for Efficient Navigation in Object-Oriented Databases (1999)

Jiawei Han Zhaohui, Jiawei Han, Zhaohui Xie, Yongjian Fu

A novel indexing structure, join index hierarchy, is proposed to handle the "goto's on disk" problem in object-oriented query processing. The method constructs a hierarchy of join...

Breaking the Barrier of Transactions: Mining Inter-Transaction Association Rules (1999)

Anthony Tung, Hongjun Lu, Jiawei Han, Ling Feng

Most of the previous studies on mining association rules are on mining intra-transaction associations, i.e., the associations among items within the same transaction, where the notion of the...

Mining Frequent Patterns without Candidate Generation (1999)

Jiawei Han, Jian Pei, Yiwen Yin

Mining frequent patterns in transaction databases, time-series databases, and many other kinds of databases has been studied popularly in data mining research. Most of the previous studies adopt an...

Word Taxonomy for On-line Visual Asset Management and Mining (1999)

Osmar R. Zaïane, Eli Hagen, Jiawei Han

We have designed and implemented MultiMediaMiner, a system prototype to mine high-level multimedia information and knowledge from large multimedia repositories like the WWW. WordNet, a semantic...

Plan Mining by Divide-and-Conquer (1999)

Jiawei Han, Qiang Yang, Edward Kim

Plans or sequences of actions are an important form of data. With the proliferation of database technology, plan databases (or planbases) are increasingly common. Efficient discovery of important...

Breaking the Barrier of Transactions: Mining Inter-Transaction Association Rules (1999)

Anthony Tung Hongjun, Hongjun Lu, Jiawei Han, Ling Feng

Most of the previous studies on mining association rules are on mining intra-transaction associations, i.e., the associations among items within the same transaction, where the notion of the...

Join Index Hierarchy: An Indexing Structure for Efficient Navigation in Object-Oriented Databases (1999)

Jiawei Han, Zhaohui Xie, Yongjian Fu

A novel indexing structure, join index hierarchy, is proposed to handle the "goto's on disk" problem in object-oriented query processing. The method constructs a hierarchy of join...

Exploratory Mining via Constrained Frequent Set Queries (1999)

Raymond Ng, Concordia U, Jiawei Han, Simon Fraser U, Teresa Mah

Although there have been many studies on data mining, to date there have been few research prototypes or commercial systems supporting comprehensive query-driven mining, which encourages interactive...

Efficient Mining of Partial Periodic Patterns in Time Series Database (1999)

Jiawei Han, Guozhu Dong, Yiwen Yin

Partial periodicity search, i.e., search for partial periodic patterns in time-series databases, is an interesting data mining problem. Previous studies on periodicity search mainly consider finding...

Constraint-Based Multidimensional Data Mining (1999)

Jiawei Han, Simon Fraser, Laks V. S, Raymond T

Integrating both constraint-based and multidimensional mining into one framework provides an interactive, exploratory environment for effective and efficient data analysis and mining. Although many...

Efficient polygon amalgamation methods for spatial OLAP and spatial data mining (1999)

Zhou, Xiaofang, Truflet, David, Han, Jiawei

fThe polygon amalgamation operation computes the boundary of the union of a set of polygons. This is an important operation for spatial on-line analytical processing and spatial data mining, where...

Selective materialization: An efficient method for spatial data cube construction (1998)

Jiawei Han, Nebojsa Stefanovic, Krzysztof Koperski

Abstract. On-line analytical processing (OLAP) has gained its popu-larity in database industry. With a huge amount of data stored in spatial databases and the introduction of spatial components to...

Towards On-Line Analytical Mining in Large Databases (1998)

Jiawei Han

Great efforts have been paid in the Intelligent Database

Towards On-Line Analytical Mining in Large Databases (1998)

Jiawei Han

Great efforts have been paid in the Intelligent Database

Generalization-based data mining in object-oriented databases using an object-cube model (1998)

Jiawei Han, Shojiro Nishio, Hiroyuki Kawano, Wei Wang

Data mining is the discovery of knowledge and useful information from the large amounts of data stored in databases. With the increasing popularity of object-oriented database systems in advanced...

Stock movement prediction and n-dimensional inter-transaction association rules (1998)

Hongjun Lu, Jiawei Han, Ling Feng

I Inadequacy in association rule mining for stock movement prediction Among all the data mining problems, discovering association rules from large databases is probably the most significant...

Mining knowledge in geographical data (1998)

Krzysztof Koperski, Jiawei Han, Junas Adhikary

Huge amounts of data have been stored in databases, data warehouses, geographic information systems, and other information repositories, and this data is still growing rapidly [4]. Companies are...

Discovering Web Access Patterns and Trends by Applying OLAP and Data Mining Technology on Web Logs (1998)

Osmar R. Zaane, Man Xin, Jiawei Han

As a con uence of data mining and WWW technologies, it is now possible to perform data mining on web log records collected from the Internet web page access history. The behaviour of the web page...

MultiMediaMiner: A system prototype for multimedia data mining (1998)

Osmar R. Zaane, Jiawei Han, Ze-nian Li, Sonny H. Chee, Jenny Y. Chiang

Multimedia data mining is the mining of high-level multimedia information and knowledge from large multimedia databases. A multimedia data mining system prototype, MultiMediaMiner, has been designed...

Webml: Querying the world-wide web for resources and knowledge (1998)

Osmar R. Zaiane, Jiawei Han

There is a massive increase of information available on electronic networks. This profusion of resources on the World-Wide Web gave rise to considerable interest in the research community....

Mining segment-wise periodic patterns in time-related databases (1998)

Jiawei Han, Wan Gong, Yiwen Yin

Periodicity search, that is, search for cyclicity in time-related databases, is an interesting data mining problem. Most previous studies have been on finding full-cycle periodicity for all the...

Towards On-Line Analytical Mining in Large Databases (1998)

Jiawei Han

Great efforts have been paid in the Intelligent Database Systems Research Lab for the research and development of efficient data mining methods and construction of on-line analytical data mining...

Discovering Web Access Patterns and Trends by Applying OLAP and Data Mining Technology on Web Logs (1998)

Osmar R. Zaïane, Man Xin, Jiawei Han

As a confluence of data mining and WWW technologies, it is now possible to perform data mining on web log records collected from the Internet web page access history. The behaviour of the web page...

WebML: Querying the World-Wide Web for Resources and Knowledge (1998)

Osmar R. Zaïane, Jiawei Han

There is a massive increase of information available on electronic networks. This profusion of resources on the WorldWide Web gave rise to considerable interest in the research community. Traditional...

Exploratory Mining and Pruning Optimizations of Constrained Associations Rules (1998)

Raymond Ng, Jiawei Han, Alex Pang

From the standpoint of supporting human-centered discovery of knowledge, the present-day model of mining association rules suffers from the following serious shortcomings: (i) lack of user...

MultiMediaMiner: A System Prototype for MultiMedia Data Mining (1998)

Osmar R. Zaïane, Jiawei Han, Ze-nian Li, Sonny H. Chee, Jenny Y. Chiang

Multimedia data mining is the mining of high-level multimedia information and knowledge from large multimedia databases. A multimedia data mining system prototype, MultiMediaMiner, has been designed...

Optimization of Constrained Frequent Set Queries with 2-variable Constraints (1998)

Raymond Ng, Jiawei Han, Simon Fraser U, Alex Pang

Currently, there is tremendous interest in providing ad-hoc mining capabilities in database management systems. As a first step towards this goal, in [15] we proposed an architecture for supporting...

Mining Knowledge in Geographical Data (1998)

Krzysztof Koperski Jiawei, Jiawei Han, Junas Adhikary

this article, a short overview is provided to summarize recent studies on spatial data mining, including spatial data mining techniques, their strengths and weaknesses, how and when to apply them,...

Stock Movement Prediction And N-Dimensional Inter-Transaction Association Rules (Extended Abstract) (1998)

Hongjun Lu, Jiawei Han, Ling Feng

Hongjun Lu 1 Jiawei Han 2 Ling Feng 3 1 The Hong Kong University of Science and Technology, China. luhj@cs.ust.hk 2 Simon Fraser University, Canada. han@cs.sfu.ca 3 The Hong Kong Polytechnic...

Mining Knowledge in Geographical Data (1998)

Krzysztof Koperski, Jiawei Han, Junas Adhikary

this article, a short overview is provided to summarize recent studies on spatial data mining, including spatial data mining techniques, their strengths and weaknesses, how and when to apply them,...

Dealing with Semantic Heterogeneity by Generalization-Based Data Mining Techniques (1998)

Jiawei Han, Raymond T. Ng, Yongjian Fu, Son K. Dao

Data mining, or knowledge discovery from databases, may play an important role at the construction of cooperative information systems. A major challenge for building cooperative information systems...

Exploratory Mining and Pruning Optimizations of Constrained Associations Rules (1998)

Raymond T. Ng, Jiawei Han, Alex Pang

From the standpoint of supporting human-centered discovery of knowledge, the present-day model of mining association rules suffers from the following serious shortcomings: (i) lack of user...

Selective Materialization: An Efficient Method for Spatial Data Cube Construction (1998)

Jiawei Han, Nebojsa Stefanovic, Krzysztof Koperski

. On-line analytical processing (OLAP) has gained its popularity in database industry. With a huge amount of data stored in spatial databases and the introduction of spatial components to many...

Issues for On-Line Analytical Mining of Data Warehouses (Extended Abstract) (1998)

Jiawei Han, Jenny Y. Chiang

) Jiawei Han, Sonny H.S. Chee and Jenny Y. Chiang Intelligent Database Systems Research Laboratory School of Computing Science, Simon Fraser University, British Columbia, Canada V5A 1S6 f han, schee,...

SeqIndex: Indexing sequences by sequential pattern analysis (1998)

Hong Cheng, Xifeng Yan, Jiawei Han

In this paper, we study the issues related to the design and construction of high-performance sequence index structures in large sequence databases. To build effective indices, a novel method, called...

Issues for On-Line Analytical Mining of Data Warehouses (1998)

Jiawei Han, Jenny Y. Chiang

Data warehouses and OLAP engines are expected to be widely available in the near future. The data in data warehouses has been cleansed, integrated, and preprocessed, and infrastructures have been...

DBMiner: A system for data mining in relational databases and data warehouses (1997)

Jiawei Han, Jenny Y. Chiang, Sonny Chee, Jianping Chen, Qing Chen, Shan Cheng, ...

A data mining system, DBMiner, has been developed for interactive mining of multiple-level knowledge in large relational databases and data warehouses. The system implements a wide spectrum of data...

GeoMiner: A System Prototype for Spatial Data Mining (1997)

Jiawei Han, Krzyszto Koperski Nebojsa

Spatial data mining is to mine high-level spatial information and knowledge from large spatial databases. A spatial data mining system prototype, GeoMiner, has been designed and developed based on...

OLAP Mining: An Integration of OLAP with Data Mining (1997)

Jiawei Han

OLAP mining is a mechanism which integrates on-line analytical processing (OLAP) with data mining so that mining can be performed in different por-tions of databases or data warehouses and at...

Generalization and decision tree induction: Efficient classification in data mining (1997)

Micheline Kamber, Lara Winstone, Wan Gong, Shan Cheng, Jiawei Han

Efficiency and scalability are fundamental issues concerning data mining in large databases. Although classification has been studied extensively, few of the known methods take serious consideration...

Metarule-guided mining of multi-dimensional association rules using data cubes (1997)

Micheline Kamber, Jiawei Han, Jenny Y. Chiang

In this paper, we employ a novel approach to metarule-guided, multi-dimensional association rule mining which explores a data cube structure. We propose algorithms for metarule-guided mining: given a...

DBMiner: A System for Data Mining in Relational Databases and Data Warehouses (1997)

Jiawei Han, Jenny Y. Chiang, Sonny Chee, Jianping Chen, Qing Chen, Shan Cheng, ...

A data mining system, DBMiner, has been developed for interactive mining of multiple-level knowledge in large relational databases and data warehouses. The system implements a wide spectrum of data...

Mining Multiple-Level Association Rules in Large Databases (1997)

Jiawei Han, Yongjian Fu

A top-down progressive deepening method is developed for efficient mining of multiple-level association rules from large transaction databases based on the Apriori principle. A group of variant...

Mining Multiple-Level Association Rules in Large Databases (1997)

Jiawei Han, Yongjian Fu, Ieee Computer Society, Ieee Computer Society

A top-down progressive deepening method is developed for efficient mining of multiple-level association rules from large transaction databases based on the Apriori principle. A group of variant...

Generalization and Decision Tree Induction: Efficient Classification in Data Mining (1997)

Micheline Kamber, Lara Winstone, Wan Gong, Shan Cheng, Jiawei Han

Efficiency and scalability are fundamental issues concerning data mining in large databases. Although classification has been studied extensively, few of the known methods take serious consideration...

Using Data Cubes for Metarule-Guided Mining of Multi-Dimensional Association Rules (1997)

Micheline Kamber, Jiawei Han, Jenny Y. Chiang

Metarule-guided mining is an interactive approach to data mining, where users probe the data under analysis by specifying hypotheses in the form of metarules, or pattern templates. Previous methods...

GeoMiner: A System Prototype for Spatial Data Mining (1997)

Jiawei Han, Krzysztof Koperski, Nebojsa Stefanovic

Spatial data mining is to mine high-level spatial information and knowledge from large spatial databases. A spatial data mining system prototype, GeoMiner, has been designed and developed based on...

OLAP Mining: An Integration of OLAP with Data Mining (1997)

Jiawei Han

OLAP mining is a mechanism which integrates on-line analytical processing (OLAP) with data mining so that mining can be performed in different portions of databases or data warehouses and at...

DBMiner: A System for Data Mining in Relational Databases and Data Warehouses (1997)

Jiawei Han, Jenny Y. Chiang, Sonny Chee, Jianping Chen, Qing Chen, Shan Cheng, ...

A data mining system, DBMiner, has been developed for interactive mining of multiple-level knowledge in large relational databases and data warehouses. The system implements a wide spectrum of data...

Data Mining Methods for the Analysis of Large Geographic Databases (1996)

Krzysztof Koperski, Jiawei Han

Spatial data mining, i.e., discovery of interesting, implicit knowledge in spatial databases, is an important task for understanding and use of spatial data- and knowledgebases. Statistical analysis...

Exploration of the power of attribute-oriented induction in data mining (1996)

Jiawei Han, Yongjian Fu

Attribute-oriented induction is a set-oriented database mining method which generalizes the task-relevant subset of data attribute-by-attribute, compresses it into a generalized relation, and...

Intelligent query answering by knowledge discovery techniques (1996)

Jiawei Han, Yue Huang, Nick Cercone, Yongjian Fu

Knowledge discovery facilitates querying database knowledge and intelligent query answering in database systems. In this paper, we investigate the application of discovered knowledge, concept...

Spatial data mining: Progress and challenges (1996)

Krzysztof Koperski, Junas Adhikary, Jiawei Han

Spatial data mining, i.e., mining knowledge from large amounts of spatial data, is a highly demanding field because huge amounts of spatial data have been collected in various applications, ranging...

DBMiner: interactive mining of multiple-level knowledge in relational databases (1996)

Jiawei Han, Yongjian Fu, Wei Wang, Jenny Chiang, Osmar R. Zaane, Krzysztof Koperski

Based on our years-of-research, a data mining system, DB-Miner, has been developed for interactive mining of multiplelevel knowledge in large relational databases. The system implements a wide...

DBMiner: A system for mining knowledge in large relational databases (1996)

Jiawei Han, Yongjian Fu, Wei Wang, Jenny Chiang, Wan Gong, Krzysztof Koperski, ...

A data mining system, DBMiner, has been developed for interactive mining of multiple-level knowledge in large relational databases. The system implements a wide spectrum of data mining functions,...

DMQL: A Data Mining Query Language for Relational Databases (1996)

Jiawei Han, Yongjian Fu, Wei Wang, Krzysztof Koperski, Osmar Zaiane

The emerging data mining tools and systems lead naturally to the demand of a powerful data mining query language, on top of which many interactive and exible graphical user interfaces can be...

A Fast Distributed Algorithm for Mining Association Rules (1996)

David W. Cheung, Jiawei Han, Vincent T. Ng, Ada W. Fu, Yongjian Fu

With the existence of many large transaction databases, the huge amounts of data, the high scalability of distributed systems, and the easy partition and distribution of a centralized database, it is...

DMQL: A Data Mining Query Language for Relational Databases (1996)

Jiawei Han, Yongjian Fu, Wei Wang, Krzysztof Koperski, Osmar Zaiane

The emerging data mining tools and systems lead naturally to the demand of a powerful data mining query language, on top of which many interactive and flexible graphical user interfaces can be...

Spatial Data Mining: Progress and Challenges - Survey paper (1996)

Krzysztof Koperski, Junas Adhikary, Jiawei Han

Spatial data mining, i.e., mining knowledge from large amounts of spatial data, is a highly demanding field because huge amounts of spatial data have been collected in various applications, ranging...

DBMiner: A System for Mining Knowledge in Large Relational Databases (1996)

Jiawei Han, Yongjian Fu, Wei Wang, Jenny Chiang, Wan Gong, Krzysztof Koperski, ...

A data mining system, DBMiner, has been developed for interactive mining of multiple-level knowledge in large relational databases. The system implements a wide spectrum of data mining functions,...

Data Mining: An Overview from a Database Perspective (1996)

Ming-syan Chen, Jiawei Han, Philip S. Yu

Mining information and knowledge from large databases has been recognized by many researchers as a key research topic in database systems and machine learning, and by many industrial companies as an...

Maintenance of Discovered Association Rules in Large Databases: An Incremental Updating Technique (1996)

David W. Cheung, Jiawei Han, Vincent T. Ng, C. Y. Wong

An incremental updating technique is developed for maintenance of the association rules discovered by database mining. There have been many studies on efficient discovery of association rules in...

Data Mining Methods for the Analysis of Large Geographic Databases (1996)

Krzysztof Koperski, Jiawei Han

this paper, a number of methods based on knowledge discovery techniques for large databases are presented. This methods may overcome some of the weaknesses of statistical analysis. Our study is...

Exploration of the Power of Attribute-Oriented Induction in Data Mining (1996)

Jiawei Han, Yongjian Fu

Attribute-oriented induction is a set-oriented database mining method which generalizes the task-relevant subset of data attribute-by-attribute, compresses it into a generalized relation, and...

Knowledge Mining in Databases: An Integration of Machine Learning Methodologies with Database Technologies (1995)

Jiawei Han, Yongjian Fu, Krzysztof Koperski, Gabor Melli, Wei Wang, Osmar R. Zaane

Active research has been conducted on knowledge discovery in databases by the researchers in our group for years, with many interesting results published and a prototyped knowledge discovery system,...

Discovery of spatial association rules in geographic information databases (1995)

Krzysztof Koperski, Jiawei Han

Abstract. Spatial data mining, i.e., discovery of interesting, implicit knowledge in spatial databases, is an important task for understanding and use of spatial data- and knowledge-bases. In this...

Advances of the DBLearn system for knowledge discovery in large databases (1995)

Jiawei Han, Yongjian Fu, Simon Tang

A prototyped data mining system, DBLearn, was developed in Simon Fraser Univ., which integrates machine learning methodologies with database technologies and efficiently and effectively extracts...

Discovery of multiple-level association rules from large databases (1995)

Jiawei Han, Yongjian Fu

Previous studies on mining association rules find rules at single concept level, however, mining association rules at multiple concept levels may lead to the discovery of more specific and concrete...

Knowledge Mining in Databases: An Integration of Machine Learning Methodologies with Database Technologies (1995)

Jiawei Han, Yongjian Fu, Krzysztof Koperski, Gabor Melli, Wei Wang, Osmar R. Za'iane

Active research has been conducted on knowledge discovery in databases by the researchers in our group for years, with many interesting results pubhshed and a prototyped knowledge discovery system,...

Normalization and compilation of deductive and objectoriented database programs for efficient query evaluation (1995)

Zhaohui Xie, Jiawei Han

Abstract. A normalization process is proposed to serve not only as a preprocessing stage for compilation and evaluation but also as a tool for classifying recursions. Then the query-independent...

Advances of the DBLearn system for knowledge discovery in large databases (1995)

Jiawei Han, Yongjian Fu, Simon Tang

A prototyped data mining system, DBLearn, was developed in Simon Fraser Univ., which integrates machine learning methodologies with database technologies and efficiently and effectively extracts...

Discovery of Multiple-Level Association Rules from Large Databases (1995)

Jiawei Han, Yongjian Fu

Previous studies on mining association rules find rules at single concept level, however, mining association rules at multiple concept levels may lead to the discovery of more specific and concrete...

Discovery of Spatial Association Rules in Geographic Information Databases (1995)

Krzysztof Koperski, Jiawei Han

. Spatial data mining, i.e., discovery of interesting, implicit knowledge in spatial databases, is an important task for understanding and use of spatial data- and knowledge-bases. In this paper, an...

Knowledge Mining in Databases: An Integration of Machine Learning Methodologies with Database Technologies (1995)

Jiawei Han, Yongjian Fu, Krzysztof Koperski, Gabor Melli, Wei Wang, Osmar R. Zaïane

Active research has been conducted on knowledge discovery in databases by the researchers in our group for years, with many interesting results published and a prototyped knowledge discovery system,...

Evaluation of Regular Nonlinear Recursions by Deductive Database Techniques (1995)

Jiawei Han

Nonlinear recursion is one of the most challenging classes of logic programs for efficient evaluation in logic programming systems. We identify one popular class of nonlinear recursion, regular...

Mining Knowledge at Multiple Concept Levels (1995)

Jiawei Han

Most studies on data mining have been focused at mining rules at single concept levels, i.e., either at the primitive level or at a rather high concept level. However, it is often desirable to...

Intelligent Query Answering by Knowledge Discovery Techniques (1995)

Jiawei Han, Yue Huang, Nick Cercone

Knowledge discovery in databases facilitates querying database knowledge, cooperative query answering and semantic query optimization in database systems. In this paper, we investigate the...

Meta-Rule-Guided Mining of Association Rules in Relational Databases (1995)

Yongjian Fu, Jiawei Han

A meta-rule-guided data mining approach is proposed and studied which applies meta-rules as a guidance at finding multiple-level association rules in large relational databases. A meta-rule is a rule...

Resource and Knowledge Discovery in Global Information Systems: A Multiple Layered Database Approach (1995)

Jiawei Han, Osmar R. Zaiane, Yongjian Fu

With huge amounts of information connected to the global information network (Internet), efficient and effective discovery of resource and knowledge from the "global information base" has...

Meta-Rule-Guided Mining of Association Rules in Relational Databases (1995)

Yongjian Fu, Jiawei Han

A meta-rule-guided data mining approach is proposed and studied which applies meta-rules as a guidance at finding multiple-level association rules in large relational databases. A meta-rule is a rule...

Intelligent Query Answering by Knowledge Discovery Techniques (1995)

Jiawei Han, Yue Huang, Nick Cercone, Yongjian Fu

Knowledge discovery facilitates querying database knowledge and intelligent query answering in database systems. In this paper, we investigate the application of discovered knowledge, concept...

Discovery of Multiple-Level Association Rules from Large Databases (1995)

Jiawei Han, Yongjian Fu

Discovery of association rules from large databases has been a focused topic recently in the research into database mining. Previous studies discover association rules at a single concept level,...

Resource and Knowledge Discovery in Global Information Systems: A Multiple Layered Database Approach (1995)

Jiawei Han, Osmar R. Zaïane, Yongjian Fu

With huge amounts of information connected to the global information network (Internet), efficient and effective discovery of resource and knowledge from the "global information base" has...

Resource and Knowledge Discovery in Global Information Systems: A Preliminary Design and Experiment (1995)

Osmar R. Zaïane, Jiawei Han

With huge amounts of information connected to the global information network (Internet), efficient and effective discovery of resource and knowledge from the "global information base" has...

Chain-Split Evaluation in Deductive Databases (1995)

Jiawei Han

Many popularly studied recursions in deductive databases can be compiled into one or a set of highly regular chain generating paths, each of which consists of one or a set of connected predicates....

Dynamic Generation and Refinement of Concept Hierarchies for Knowledge Discovery in Databases (1994)

Jiawei Han, Yongjian Fu

Concept hierarchies organize data and concepts in hierarchical forms or in certain partial order, which helps expressing knowledge and data relationships in databases in concise, high level terms,...

LogicBase: A deductive database system prototype (1994)

Jiawei Han, Ling Liu, Zhaohui Xie

A deductive database system prototype, LogicBase, has been developed, with an emphasis on efficient compilation and query evaluation of application-oriented recursions in deductive databases. The...

Constraint-based query evaluation in deductive databases (1994)

Jiawei Han

Abstract--- Constraints play an important role in the efficient query evaluation in deductive databases. In this paper, constraint-based query evaluation in deductive databases is investigated, with...

Knowledge discovery in databases: A rule-based attribute-oriented approach (1994)

Jiawei Han

Abstract. An attribute-oriented induction has been developed in the previous study of knowledge discovery in databases. A concept tree ascension technique is applied in concept generalization. In...

How does knowledge discovery cooperate with active database techniques in controlling dynamic environment (1994)

Hiroyuki Kawano, Shojiro Nishio, Jiawei Han, Toshiharu Hasegawa

Abstract. A dynamic environment, such as a production process, a communication network, highway traffic, etc., may contain a huge amount of information, changing with time, which is a valuable...

Efficient and Effective Clustering Methods for Spatial Data Mining (1994)

Raymond T. Ng, Jiawei Han

Spatial data mining is the discovery of interesting relationships and characteristics that may exist implicitly in spatial databases. In this paper, we explore whether clustering methods have a role...

Knowledge Discovery in Object-Oriented and Active Databases (1994)

Jiawei Han, Shojiro Nishio, Hiroyuki Kawano

Knowledge discovery in databases (or data mining) , which extracts interesting knowledge from large databases, represents an important direction in the development of data- and knowledge- base...

Discovery of Data Evolution Regularities in Large Databases (1994)

Jiawei Han, Ong Cai, Nick Cercone, Yue Huang

. A large volume of concrete data may change over time in a database. It is important to catch the general trend of such changes and find data evolution (changing) regularities in databases in many...

Cooperative Query Answering Using Multiple Layered Databases Research (1994)

Jiawei Han, Yongjian Fu, Raymond T. Ng

How can a real-estate agent respond to inquiries quickly and intelligently? The `trick' could be using a simple table to briefly outline the general information and a complete book to reference...

Dynamic Generation and Refinement of Concept Hierarchies for Knowledge Discovery in Databases (1994)

Jiawei Han, Yongjian Fu

Concept hierarchies organize data and concepts in hierarchical forms or in certain partial order, which helps expressing knowledge and data relationships in databases in concise, high level terms,...

Knowledge Discovery in Databases: A Rule-Based Attribute-Oriented Approach (1994)

Jiawei Han

. An attribute-oriented induction has been developed in the previous study of knowledge discovery in databases. A concept tree ascension technique is applied in concept generalization. In this paper,...

Constraint-Based Query Evaluation in Deductive Databases (1994)

Jiaw Ei Han, Jiawei Han

Constraints play an important role in the efficient query evaluation in deductive databases. In this paper, constraint-based query evaluation in deductive databases is investigated, with the emphasis...

Join index hierarchies for supporting efficient navigations in object-oriented databases (1994)

Zhaohui Xie, Jiawei Han

A join index hierarchy method is proposed to handle the “goto’s on disk ” problem in objectoriented query processing. The method constructs a hierarchy of join indices and transforms a sequence...

Data-driven discovery of quantitative rules in relational databases (1993)

Jiawei Han, Ong Cai, Nick Cercone

Abstract-A quantitative rule is a rule associated with quantitative information which assesses the representativeness of the rule in the database. In this paper, an efficient induction method is...

Homomorphic Tree Embeddings and Their Applications to Recursive Program Optimization (1993)

Karima Ashraf, Jiawei Han

This paper is concerned with the problems of stage preserving linearization and 1-boundedness for

Discovery of General Knowledge in Large Spatial Databases (1993)

Wei Lu, Jiawei Han, Beng Chin Ooi

Extraction of interesting and general knowledge from large spatial databases is an important task in the development of spatial data- and knowledge-base systems. In this paper, we investigate...

Knowledge Discovery in Databases: An Attribute-Oriented Approach (1992)

Jiawei Han, Yandong Cai, Ong Cai, Nick Cercone

Knowledge discovery in databases, or data mining, is an important issue in the development of data- and knowledge-base systems. An attribute-oriented induction method has been developed for knowledge...

Concept-Based Data Classification in Relational Databases (1991)

Jiawei Yandong, Jiawei Han, Ong Cai, Nick Cercone

. Data classification is a process which groups objects with common properties into classes and produces a classification scheme over a set of data objects. Data classification is useful for...

Emerging scientific applications in data mining (0000)

Han, Jiawei

The article focuses on emerging scientific applications in the field of data mining as of August 1, 2002. Recent progress in scientific and engineering applications has accumulated huge volumes of...

Emerging scientific applications in data mining

Han, Jiawei

The article focuses on emerging scientific applications in the field of data mining as of August 1, 2002. Recent progress in scientific and engineering applications has accumulated huge volumes of...

An Efficient Two-Step Method for Classification of Spatial Data

Krzysztof Koperski, Jiawei Han, Nebojsa Stefanovic

Spatial data mining, i.e., discovery of interesting, implicit knowledge in spatial databases, is a highly demanding field because very large amounts of spatial data have been collected in various...

Mining Frequent Itemsets with Convertible Constraints

Jian Pei, Jiawei Han

constraint-based mining paradigm in the context of frequent itemsets, associations, correlations, sequential patterns, and many other interesting patterns in large databases. In this paper, we study...