Gao Cong

Publication List Details

Period

2002 - 2009

Number

23

Co-Authors

Categories and Subject Descriptors H.2.8 [Database Management]: Database Applications- Data Mining (2009)

Feng Pan, Gao Cong

closed pattern, row enumeration The growth of bioinformatics has resulted in datasets with new characteristics. These datasets typically contain a large number of columns and a small number of rows....

PRATA: A System for XML Publishing, Integration and View Maintenance (2008)

Gao Cong, Wenfei Fan, Xibei Jia, Shuai Ma

We present PRATA, a system that supports the following in a uniform framework: (a) XML publishing, i.e., converting data from databases to an XML document, (b) XML integration, i.e., extracting data...

Detecting Erroneous Sentences using Automatically Mined Sequential Patterns (2008)

Guihua Sun, Xiaohua Liu, Gao Cong, Ming Zhou, Zhongyang Xiong, John Lee, ...

This paper studies the problem of identifying erroneous/correct sentences. The problem has important applications, e.g., providing feedback for writers of English as a Second Language, controlling...

ABSTRACT (2008)

Gao Cong, Xin Xu, Feng Pan

Microarray datasets typically contain large number of columns but small number of rows. Association rules have been proved to be useful in analyzing such datasets. However, most existing association...

Categories and Subject Descriptors H.2.8 [Database Management]: Database Applications- Data Mining (2008)

Feng Pan, Gao Cong

closed pattern, row enumeration The growth of bioinformatics has resulted in datasets with new characteristics. These datasets typically contain a large number of columns and a small number of rows....

• Schema-Directed XML Publishing • XML Integration • Incremental Maintenance of XML Views (2008)

Gao Cong, Wenfei Fan, Xibei Jia, Shuai Ma

Abstract: We present PRATA, a system that supports the following in a uniform framework: (a) XML publishing, i.e., converting data from databases to an XML document, (b) XML integration, i.e.,...

Improving Data Quality: Consistency and Accuracy (2008)

Gao Cong, Wenfei Fan, Floris Geerts

Two central criteria for data quality are consistency and accuracy. Inconsistencies and errors in a database often emerge as violations of integrity constraints. Given a dirty database D, one needs...

ABSTRACT (2007)

Gao Cong, Xin Xu, Feng Pan

Microarray datasets typically contain large number of columns but small number of rows. Association rules have been proved to be useful in analyzing such datasets. However, most existing association...

Speed-up iterative frequent itemset mining with constraint changes, ICDE’02 (2007)

Gao Cong, Bing Liu

Mining of frequent itemsets is a fundamental data mining task. Past research has proposed many efficient algorithms for the purpose. Recent work also highlighted the importance of using constraints...

Categories and Subject Descriptors H.2.8 [Database Management]: Database Applications- Data Mining (2007)

Feng Pan, Gao Cong

closed pattern, row enumeration The growth of bioinformatics has resulted in datasets with new characteristics. These datasets typically contain a large number of columns and a small number of rows....

Semantic Mining and Analysis of Gene Expression Data (2007)

Xin Xu, Gao Cong, Beng Chin, Ooi Kian-lee, Tan Anthony, K. H. Tung

Association rules can reveal biological relevant relationship between genes and environments / categories. However, most existing association rule mining algorithms are rendered impractical on gene...

Using partial evaluation in distributed query evaluation (2006)

Peter Buneman, Gao Cong

A basic idea in parallel query processing is that one is prepared to do more computation than strictly necessary at individual sites in order to reduce the elapsed time, the network traffic, or both...

Annotation propagation revisited for key preserving views (2006)

Gao Cong, Wenfei Fan, Floris Geerts

This paper revisits the analysis of annotation propagation from source databases to views defined in terms of conjunctive (SPJ) queries. Given a source database D, an SPJ query Q, the view Q(D) and a...

Using partial evaluation in distributed query evaluation (2006)

Peter Buneman, Gao Cong

A basic idea in parallel query processing is that one is prepared to do more computation than strictly necessary at individual sites in order to reduce the elapsed time, the network traffic, or both...

Semi-supervised Text Classification Using Partitioned EM (2004)

Gao Cong, Wee Sun Lee, Haoran Wu, Bing Liu

Abstract. Text classification using a small labeled set and a large unlabeled data is seen as a promising technique to reduce the labor-intensive and time consuming effort of labeling training data...

Mining frequent closed patterns in microarray data (2004)

Gao Cong, Kian-lee Tan, Feng Pan

Microarray data typically contains a large number of columns and a small number of rows, which poses a great challenge for existing frequent (closed) pattern mining algorithms that discover patterns...

COBBLER: Combining column and Row Enumeration for Closed Pattern Discovery (2004)

Feng Pan, Gao Cong, Xu Xin

The problem of mining frequent closed patterns has receive considerable attention recently as it promises to have much less redundancy compared to discovering all frequent patterns. Existing...

Semi-supervised Text Classification Using Partitioned EM (2004)

Gao Cong, Wee Sun Lee, Haoran Wu, Bing Liu

Abstract. Text classification using a small labeled set and a large unlabeled data is seen as a promising technique to reduce the labor-intensive and time consuming effort of labeling training data...

Discovering Frequent Substructures from Hierarchical Semi-structured Data (2002)

Gao Cong, Lan Yi, Bing Liu, Ke Wang

Abstract: Frequent substructure discovery from a collection of semi-structured objects can serve for storage, browsing, querying, indexing and classification of semi-structured documents. This paper...

Discovering Frequent Substructures from Hierarchical Semi-structured Data

Gao Cong, Lan Yi, Bing Liu, Ke Wang

Frequent substructure discovery from a collection of semi-structured objects can serve for storage, browsing, querying, indexing and classification of semi-structured documents. This paper examines...