closed pattern, row enumeration The growth of bioinformatics has resulted in datasets with new characteristics. These datasets typically contain a large number of columns and a small number of rows....
PRATA: A System for XML Publishing, Integration and View Maintenance (2008)
Gao Cong, Wenfei Fan, Xibei Jia, Shuai Ma
We present PRATA, a system that supports the following in a uniform framework: (a) XML publishing, i.e., converting data from databases to an XML document, (b) XML integration, i.e., extracting data...
Detecting Erroneous Sentences using Automatically Mined Sequential Patterns (2008)
Guihua Sun, Xiaohua Liu, Gao Cong, Ming Zhou, Zhongyang Xiong, John Lee, ...
This paper studies the problem of identifying erroneous/correct sentences. The problem has important applications, e.g., providing feedback for writers of English as a Second Language, controlling...
Microarray datasets typically contain large number of columns but small number of rows. Association rules have been proved to be useful in analyzing such datasets. However, most existing association...
Detecting Erroneous Sentences using Automatically Mined Sequential Patterns (2008)
Tom Mitchel, Lauri Karttunen, Diana Mccarthy, Gertjan Van Noort, Pascale Fung, Vera Demberg, ...
A Discriminative Language Model with
closed pattern, row enumeration The growth of bioinformatics has resulted in datasets with new characteristics. These datasets typically contain a large number of columns and a small number of rows....
• Schema-Directed XML Publishing • XML Integration • Incremental Maintenance of XML Views (2008)
Gao Cong, Wenfei Fan, Xibei Jia, Shuai Ma
Abstract: We present PRATA, a system that supports the following in a uniform framework: (a) XML publishing, i.e., converting data from databases to an XML document, (b) XML integration, i.e.,...
Improving Data Quality: Consistency and Accuracy (2008)
Gao Cong, Wenfei Fan, Floris Geerts
Two central criteria for data quality are consistency and accuracy. Inconsistencies and errors in a database often emerge as violations of integrity constraints. Given a dirty database D, one needs...
Microarray datasets typically contain large number of columns but small number of rows. Association rules have been proved to be useful in analyzing such datasets. However, most existing association...
Speed-up iterative frequent itemset mining with constraint changes, ICDE’02 (2007)
Mining of frequent itemsets is a fundamental data mining task. Past research has proposed many efficient algorithms for the purpose. Recent work also highlighted the importance of using constraints...
closed pattern, row enumeration The growth of bioinformatics has resulted in datasets with new characteristics. These datasets typically contain a large number of columns and a small number of rows....
Semantic Mining and Analysis of Gene Expression Data (2007)
Xin Xu, Gao Cong, Beng Chin, Ooi Kian-lee, Tan Anthony, K. H. Tung
Association rules can reveal biological relevant relationship between genes and environments / categories. However, most existing association rule mining algorithms are rendered impractical on gene...
Using partial evaluation in distributed query evaluation (2006)
A basic idea in parallel query processing is that one is prepared to do more computation than strictly necessary at individual sites in order to reduce the elapsed time, the network traffic, or both...
Annotation propagation revisited for key preserving views (2006)
Gao Cong, Wenfei Fan, Floris Geerts
This paper revisits the analysis of annotation propagation from source databases to views defined in terms of conjunctive (SPJ) queries. Given a source database D, an SPJ query Q, the view Q(D) and a...
Using partial evaluation in distributed query evaluation (2006)
A basic idea in parallel query processing is that one is prepared to do more computation than strictly necessary at individual sites in order to reduce the elapsed time, the network traffic, or both...
Semi-supervised Text Classification Using Partitioned EM (2004)
Gao Cong, Wee Sun Lee, Haoran Wu, Bing Liu
Abstract. Text classification using a small labeled set and a large unlabeled data is seen as a promising technique to reduce the labor-intensive and time consuming effort of labeling training data...
Mining frequent closed patterns in microarray data (2004)
Gao Cong, Kian-lee Tan, Feng Pan
Microarray data typically contains a large number of columns and a small number of rows, which poses a great challenge for existing frequent (closed) pattern mining algorithms that discover patterns...
COBBLER: Combining column and Row Enumeration for Closed Pattern Discovery (2004)
The problem of mining frequent closed patterns has receive considerable attention recently as it promises to have much less redundancy compared to discovering all frequent patterns. Existing...
Semi-supervised Text Classification Using Partitioned EM (2004)
Gao Cong, Wee Sun Lee, Haoran Wu, Bing Liu
Abstract. Text classification using a small labeled set and a large unlabeled data is seen as a promising technique to reduce the labor-intensive and time consuming effort of labeling training data...
Discovering Frequent Substructures from Hierarchical Semi-structured Data (2002)
Gao Cong, Lan Yi, Bing Liu, Ke Wang
Abstract: Frequent substructure discovery from a collection of semi-structured objects can serve for storage, browsing, querying, indexing and classification of semi-structured documents. This paper...
Discovering Frequent Substructures from Hierarchical Semi-structured Data
Gao Cong, Lan Yi, Bing Liu, Ke Wang
Frequent substructure discovery from a collection of semi-structured objects can serve for storage, browsing, querying, indexing and classification of semi-structured documents. This paper examines...