Genomic imbalances in precancerous tissues signal oral cancer risk (2009)
Garnis, Cathie, Chari, Raj, Buys, Timon PH, Zhang, Lewei, Ng, Raymond T, Rosin, Miriam P, ...
Abstract Oral cancer develops through a series of histopathological stages: through mild (low grade), moderate, and severe (high grade) dysplasia to carcinoma in situ and then invasive disease. Early...
Model-based clustering of array CGH data (2009)
Shah, Sohrab P., Cheung, K-John, Johnson, Nathalie A., Alain, Guillaume, Gascoyne, Randy D., Horsman, Douglas E., ...
Motivation: Analysis of array comparative genomic hybridization (aCGH) data for recurrent DNA copy number alterations from a cohort of patients can yield distinct sets of molecular signatures or...
ABSTRACT To Do or Not To Do: The Dilemma of Disclosing Anonymized Data (2008)
Decision makers of companies often face the dilemma of whether to release data for knowledge discovery, vis a vis the risk of disclosing proprietary or sensitive information. While there are various...
Abstract Exploratory Mining and Pruning Optimizations of Constrained Associations Rules (2008)
Raymond T. Ng, Laks V. S, Lakshmanan Jiawei Han, Alex Pang
From the standpoint of supporting human-centered discov-ery of knowledge, the present-day model of mining asso-ciation rules suffers from the following serious shortcom-ings: (i) lack of user...
Chari, Raj, Coe, Bradley P, Wedseltoft, Craig, Benetti, Marie, Wilson, Ian M, Vucic, Emily A, ...
Abstract Background High throughput microarray technologies have afforded the investigation of genomes, epigenomes, and transcriptomes at unprecedented resolution. However, software packages to...
MD-SeeGH: a platform for integrative analysis of multi-dimensional genomic data (2008)
Chi, Bryan, DeLeeuw, Ronald J, Coe, Bradley P, Ng, Raymond T, MacAulay, Calum, Lam, Wan L
Abstract Background Recent advances in global genomic profiling methodologies have enabled multi-dimensional characterization of biological systems. Complete analysis of these genomic profiles...
Algorithms for Mining Distance-Based Outliers in Large (2008)
This paper deals with finding outliers (ex-ceptions) in large, multidimensional datasets. The identification of outliers can lead to the discovery of truly unexpected knowledge in ar-eas such as...
Accessing an ever increasing number of emails, possibly on small mobile devices, has become a major problem for many users. Email summarization is a promising way to solve this problem. In this...
Searching for dependencies at multiple abstraction levels (2008)
Toon Calders, Raymond T. Ng, Jef Wijsen
The notion of roll-up dependency (RUD) extends functional dependencies with generalization hierarchies. RUDs can be applied in OLAP and database design. The problem of discovering RUDs in large...
Disruption of the Non-Canonical WNT Pathway in Lung Squamous Cell Carcinoma (2008)
Raj Chari, Andy Lam, Raymond T. Ng, John Yee, John English, ...
Disruptions of beta-catenin and the canonical Wnt pathway are well documented in cancer. However, little is known of the non-canonical branch of the Wnt pathway. In this study, we investigate the...
Disruption of the Non-Canonical WNT Pathway in Lung Squamous Cell Carcinoma (2008)
Raj Chari, Andy Lam, Raymond T. Ng, John Yee, John English, ...
Disruptions of beta-catenin and the canonical Wnt pathway are well documented in cancer. However, little is known of the non-canonical branch of the Wnt pathway. In this study, we investigate the...
SIGMA2: A system for the integrative genomic multi-dimensional (2008)
Bmc Bioinformatics, Raj Chari, Bradley P Coe, Craig Wedseltoft, Marie Benetti, Emily A Vucic, ...
Software
Performing Boundary Shape Matching in Spatial Data (2007)
Edwin M. Knorr, Raymond T. Ng, David L. Shilvock
This paper describes a new approach to knowledge discovery among spatial objects---namely that of partial boundary shape matching. Our focus is on mining spatial data, whereby many objects called...
Computing Circumscriptive Databases, Part I: Theory and Algorithms (2007)
Anil Nerode, Raymond T. Ng, V.S. Subrahmanian, As I
this paper, we have adopted the view of model-theoretic circumscription (cf. Etherington [15], Horty [22]) which is quite well-known in the literature [40] and we restrict interest only to Herbrand...
Markus M. Breunig, Hans-peter Kriegel, Raymond T. Ng, Jörg S
Abstract: For many KDD applications finding the outliers, i.e. the rare events, is more interesting and useful than finding the common cases, e.g. detecting criminal activities in E-commerce. Being...
Raymond T. Ng, Jrg Sander, Monica C. Sleumer
In this paper we present a method for clustering SAGE (Serial Analysis of Gene Expression) data to detect similarities and dissimilarities between different types of cancer on the subcellular level....
TIMECENTER Participants (2007)
Jef Wijsen, Author(s Jef Wijsen, Raymond T. Ng, Raymond T. Ng, Michael H. Bohlen, ...
URL: Any software made available via TIMECENTER is provided "as is " and without any express or implied warranties, including, without limitation, the implied warranty of...
Canada. Spatial data mining is the discovery of interesting relationships and characteristics that may exist implicitly in spatial databases. In this paper, we explore whether clustering methods have...
Incompleteness in Data Mining (2007)
Database technology, as well as the bulk of data mining technology, is founded upon logic, with absolute notions of truth and falsehood, at least with respect to the data set. Patterns are discovered...
Yannis E. Ioannidis, Raymond T. Ng, Kyuseok Shim, Timos K. Sellis
In most database systems, the values of many important run-time parameters of the system, the data, or the query are unknown at query optimization time. Parametric query optimization attempts to...
Abstract. Constrained clustering | nding clusters that satisfy user-specied constraints|is highly desirable in many applications. In this paper, we introduce the constrained clustering problem and...
Effect of active smoking on the human bronchial epithelium transcriptome (2007)
Chari, Raj, Lonergan, Kim M, Ng, Raymond T, MacAulay, Calum, Lam, Wan L, Lam, Stephen
Abstract Background Lung cancer is the most common cause of cancer-related deaths. Tobacco smoke exposure is the strongest aetiological factor associated with lung cancer. In this study, using serial...
Modeling recurrent DNA copy number alterations in array CGH data (2007)
Shah, Sohrab P., Lam, Wan L., Ng, Raymond T., Murphy, Kevin P.
Motivation: Recurrent DNA copy number alterations (CNA) measured with array comparative genomic hybridization (aCGH) reveal important molecular features of human genetics and disease. Studying aCGH...
Preservation of patterns and input-output privacy (2007)
Shaofeng Bu, Raymond T. Ng, Ganesh Ramesh
Abstract breaches. To do so, the data custodian needs to transform its data. To determine the appropriate transfor-Privacy preserving data mining so far has mainly mation, there are two critical...
MDQC: a new quality assessment method for microarrays based on quality control reports (2007)
Cohen Freue, Gabriela V., Hollander, Zsuzsanna, Shen, Enqing, Zamar, Ruben H., Balshaw, Robert, Scherer, Andreas, ...
Motivation: The process of producing microarray data involves multiple steps, some of which may suffer from technical problems and seriously damage the quality of the data. Thus, it is essential to...
Bmc Genomics, Raj Chari, Kim M Lonergan, Raymond T Ng, Calum Macaulay, Wan L Lam, ...
Research article Effect of active smoking on the human bronchial epithelium transcriptome
BIOINFORMATICS ORIGINAL PAPER (2006)
Gene Expression, Andrea Malossini, Enrico Blanzieri, Raymond T. Ng
doi:10.1093/bioinformatics/btl346 Detecting potential labeling errors in microarrays by data perturbation
Expressive Power of an Algebra for Data Mining (2006)
The relational data model has simple and clear foundations on which significant theoretical and systems research has flourished. By contrast, most research on data mining has focused on algorithmic...
Interactive multimedia summaries of evaluative text (2006)
We present an interactive multimedia interface for automatically summarizing large corpora of evaluative text (e.g. online product reviews). We rely on existing techniques for extracting knowledge...
Detecting potential labeling errors in microarrays by data perturbation (2006)
Malossini, Andrea, Blanzieri, Enrico, Ng, Raymond T.
Motivation: Classification is widely used in medical applications. However, the quality of the classifier depends critically on the accurate labeling of the training data. But for many medical...
MDL Summarization with Holes (2005)
Summarization of query results is an important problem for many OLAP applications.
Scalable discovery of hidden emails from large folders (2005)
The popularity of email has triggered researchers to look for ways to help users better organize the enormous amount of information stored in their email folders. One challenge that has not been...
A methodology for analyzing SAGE libraries for cancer profiling (2005)
Jörg Sander, Raymond T. Ng, Monica C. Sleumer, Man Saint Yuen, Steven J. Jones
Serial Analysis of Gene Expression (SAGE) has proven to be an important alternative to microarray techniques for global profiling of mRNA populations. We have developed preprocessing methodologies to...
Extracting knowledge from evaluative text (2005)
Capturing knowledge from free-form evaluative texts about an entity is a challenging task. New techniques of feature extraction, polarity determination and strength evaluation have been proposed....
ItCompress: An iterative semantic compression algorithm (2004)
H. V. Jagadish, Raymond T. Ng, Beng Chin, Ooi Anthony, K. H. Tung
Real datasets are often large enough to necessitate data compression. Traditional ‘syntactic ’ data compression methods treat the table as a large byte string and operate at the byte level. The...
ItCompress: An iterative semantic compression algorithm (2004)
H. V. Jagadish, Raymond T. Ng, Beng Chin Ooi
Real datasets are often large enough to necessitate data compression. Traditional ‘syntactic ’ data compression methods treat the table as a large byte string and operate at the byte level. The...
Yozo Hida, Paul Huang, Rajesh Nishtala, Baruch Awerbuch, David Holmer, Cristina Nita-rotaru, ...
routing protocol resilient to byzantine failures. In ACM Workshop on Wireless Security (WiSe), Atlanta,
Efficient dynamic mining of constrained frequent sets (2003)
Data mining is supposed to be an iterative and exploratory process. In this context, we are working on a project with the overall objective of developing a practical computing environment for the...
Hierarchical cluster analysis of SAGE data for cancer profiling (2001)
In this paper we present a method for clustering SAGE (Serial Analysis of Gene Expression) data to detect similarities and dissimilarities between different types of cancer on the subcellular level....
Robust space transformations for distance-based operations (2001)
Edwin M. Knorr, Raymond T. Ng, Ruben H. Zamar
For many KDD operations, such as nearest neighbor search, distance-based clustering, and outlier detection, there is an underlying k-D data space in which each tuple/object is represented as a point...
One-dimensional and multi-dimensional substring selectivity estimation (2000)
Kapitskaia, Olga, Srivastava, Divesh, Jagadish, H.V., Ng, Raymond T.
With the increasing importance of XML, LDAP directories, and text-based information sources on the Internet, there is an ever-greater need to evaluate queries involving (sub)string matching. In many...
The 3W Model and Algebra for Unified Data Mining (2000)
Theodore Johnson, Raymond T. Ng
Real data mining/analysis applications call for a framework which adequately supports knowledge discovery as a multi-step process, where the input of one mining operation can be the output of...
Evolution and Revolutions in LDAP Directory Caches (2000)
Olga Kapitskaia Raymond, Raymond T. Ng, Divesh Srivastava
. LDAP directories have recently proliferated with the growth of the Internet, and are being used in a wide variety of network-based applications. In this paper, we propose the use of generalized...
LOF: Identifying Density-Based Local Outliers (2000)
Markus Breunig, Hans-Peter Kriegel, Raymond T. Ng, Jörg Sander, Jörg S
For many KDD applications, such as detecting criminal activities in E-commerce, finding the rare instances or the outliers, can be more interesting than finding the common patterns. Existing work in...
Distance-Based Outliers: Algorithms and Applications (2000)
Edwin M. Knorr, Raymond T. Ng, Vladimir Tucakov
. This paper deals with finding outliers (exceptions) in large, multidimensional datasets. The identification of outliers can lead to the discovery of truly unexpected knowledge in areas such as...
Temporal Dependencies Generalized for Spatial and Other Dimensions (1999)
. Recently, there has been a lot of interest in temporal granularity , and its applications in temporal dependency theory and data mining. Generalization hierarchies used in multi-dimensional...
Multi-Dimensional Substring Selectivity Estimation (1999)
H. V. Jagadish, Olga Kapitskaia, Raymond T. Ng, Divesh Srivastava
With the explosion of the Internet, LDAP directories and XML, there is an ever greater need to evaluate queries involving (sub)string matching. In many cases, matches need to be on multiple...
Finding Intensional Knowledge of Distance-Based Outliers (1999)
Existing studies on outliers focus only on the identification aspect; none provides any intensional knowledge of the outliers---by which we mean a description or an explanation of why an identified...
OPTICS-OF: Identifying local outliers (1999)
Markus M. Breunig, Hans-Peter Kriegel, Raymond T. Ng, Jörg Sander
For many KDD applications finding the outliers, i.e. the rare events, is more interesting and useful than finding the common cases, e.g. detecting criminal activities in E-commerce. Being an outlier,...
Substring Selectivity Estimation (1999)
H. V. Jagadish, U Of Illinois, Raymond T. Ng, Divesh Srivastava
With the explosion of the Internet, LDAP directories and XML, there is an ever greater need to evaluate queries involving (sub)string matching. Effective query optimization in this context requires...
Discovering Roll-Up Dependencies (1999)
We introduce the problem of discovering functional deter-minacies that result from “rolling up ” data to a higher ab-straction level. Such a determinacy is called a Roll-Up De-pendency (RUD). An...
TIMECENTER Participants (1999)
Jef Wijsen, Raymond T. Ng, Michael H. Böhlen, Heidi Gregersen, Dieter Pfoser, ...
Any software made available via TIMECENTER is provided “as is ” and without any express or implied warranties, including, without limitation, the implied warranty of merchantability and fitness...
Exploratory mining and pruning optimizations of constrained associations rules (1998)
Raymond T. Ng, Alex Pang, Jiawei Hah
From the standpoint of supporting human-centered discovery of knowledge, the present-day model of mining association rules suffers from the following serious shortcom-ings: (i) lack of user...
Discovering Roll-Up Dependencies (1998)
Jef Wijsen, Raymond T. Ng, Day Price
Roll-up dependencies (RUD) generalize functional dependencies (FDs) for relational databases that support roll-up/drill-down. Consider the following example. Suppose we keep track of the daily price...
Discovering Roll-Up Dependencies (1998)
Jef Wijsen, Raymond T. Ng, Toon Calders
We introduce the problem of discovering functional determinacies that result from "rolling up" data to a higher abstraction level. Such a determinacy is called a Roll-Up Dependency (RUD)....
Dealing with Semantic Heterogeneity by Generalization-Based Data Mining Techniques (1998)
Jiawei Han, Raymond T. Ng, Yongjian Fu, Son K. Dao
Data mining, or knowledge discovery from databases, may play an important role at the construction of cooperative information systems. A major challenge for building cooperative information systems...
Exploratory Mining and Pruning Optimizations of Constrained Associations Rules (1998)
Raymond T. Ng, Jiawei Han, Alex Pang
From the standpoint of supporting human-centered discovery of knowledge, the present-day model of mining association rules suffers from the following serious shortcomings: (i) lack of user...
Algorithms for Mining Distance-Based Outliers in Large Datasets (1998)
This paper deals with finding outliers (exceptions) in large, multidimensional datasets. The identification of outliers can lead to the discovery of truly unexpected knowledge in areas such as...
Semantics, Consistency and Query Processing of Empirical Deductive Databases (1997)
In recent years, there has been growing interest in reasoning with uncertainty in logic programming and deductive databases. However, most frameworks proposed thus far are either non-probabilistic in...
A Unified Approach for Mining Outliers (1997)
This paper deals with finding outliers (exceptions) in large datasets. The identification of outliers can often lead to the discovery of truly unexpected knowledge in areas such as electronic...
Parametric Query Optimization (1997)
Yannis E. Ioannidis, Raymond T. Ng, Kyuseok Shim, Timos K. Sellis
.<F3.733e+05> In most database systems, the values of many important run-time parameters of the system, the data, or the query are unknown at query optimization time. Parametric query...
A Unified Notion of Outliers: Properties and Computation (1997)
As said in signal processing, "One person's noise is another person's signal." For many applications, such as the exploration of satellite or medical images, and the monitoring of...
Finding Boundary Shape Matching Relationships in Spatial Data (1997)
Edwin M. Knorr, Raymond T. Ng, David L. Shilvock
. This paper considers a new kind of knowledge discovery among spatial objects---namely that of partial boundary shape matching. Our focus is on mining spatial data, whereby many objects called...
Parametric Query Optimization (1997)
Yannis E. Ioannidis, Raymond T. Ng, Kyuseok Shim, Timos K. Sellis, C Springer-verlag
In most database systems, the values of many important run-time parameters of the system, the data, or the query are unknown at query optimization time. Parametric query optimization attempts to...
Finding Aggregate Proximity Relationships and Commonalities in Spatial Data Mining (1996)
Abstract--In this paper, we study two spatial knowledge discovery problems involvingproximity relationships between clusters and features. The first problem is: Given a clusterof points, how can we...
An Analysis of Buffer Sharing and Prefetching Techniques for Multimedia Systems (1996)
In this paper, we study the problem of how to maximize the throughput of a continuous-media system, given a fixed amount of buffer space and disk bandwidth both pre-determined at design-time. Our...
Extraction of Spatial Proximity Patterns by Concept Generalization (1996)
We study the spatial data mining problem of how to extract a special type of proximity relationship---namely that of distinguishing two clusters of points based on the types of their neighbouring...
Schemes for implementing buffer sharing in continuous-media systems (1995)
Dwight J. Makaroff, Raymond T. Ng
Abstract--- Buffer management in continuous-media systems is a frequently studied topic. One of the most interesting recent proposals is the idea of buffer sharing for concurrent streams. As analyzed...
Buffer Sharing Schemes for Continuous-Media Systems (1995)
Dwight J. Makaroff, Raymond T. Ng
Buffer management in continuous-media systems is a frequently studied topic. One of the most interesting recent proposals is the idea of buffer sharing for concurrent streams. As analyzed in [8], by...
Efficient and Effective Clustering Methods for Spatial Data Mining (1994)
Spatial data mining is the discovery of interesting relationships and characteristics that may exist implicitly in spatial databases. In this paper, we explore whether clustering methods have a role...
Incremental Algorithms for Optimizing Model Computation Based on Partial Instantiation (1994)
It has been shown that mixed integer programming methods can effectively support minimal model, stable model and well-founded model semantics for ground deductive databases. Recently, a novel...
Efficient and Effective Clustering Methods for Spatial Data Mining (1994)
Spatial data mining is the discovery of interesting relationships and characteristics that may exist implicitly in spatial databases. In this paper, we explore whether clustering methods have a role...
Mixed Integer Programming Methods for Computing Nonmonotonic Deductive Databases (1994)
Colin Bell, Anil Nerode, Raymond T. Ng, V. S. Subrahmanian
Though the declarative semantics of both explicit and nonmonotonic negation in logic programs has been studied extensively, relatively little work has been done on computation and implementation of...
Cooperative Query Answering Using Multiple Layered Databases Research (1994)
Jiawei Han, Yongjian Fu, Raymond T. Ng
How can a real-estate agent respond to inquiries quickly and intelligently? The `trick' could be using a simple table to briefly outline the general information and a complete book to reference...
Efficient and Effective Clustering Methods for Spatial Data Mining (1994)
Spatial data mining is the discovery of inter-esting relationships and characteristics that may exist implicitly in spatial databases. In this paper, we explore whether clustering methods have a role...
Maximizing buffer and disk utilizations for news on-demand (1994)
In this paper, we study the problem of how to maximize the throughput of a multimedia system, given a fixed amount of buffer space and disk bandwidth both pm-determined at design-time. Our approach...
Parametric Query Optimization (1992)
Yannis E. Ioannidis, Raymond T. Ng, Kyuseok Shim, Timos K. Sellis
In most database systems, the values of many important run-time parameters of the system, the data, or the query are unknown at query optimization time. Parametric query optimization attempts to...
Thesis: Decision Information Systems Based on Data Warehouse (1992)
Xiaodong Zhou, Supervisors Prof, Raymond T. Ng, Supervisor Prof, Raymond T. Ng, Supervisor Prof, ...
(Please refer to my research statement for details.) • Email summarization, 2005 – 2007, (ACM WWW’07[1] and EMNLP’07(submitted)[5]) I studied how to summarize email conversations such that...
A Comprehensive Analysis of Common Copy-Number Variations in the Human Genome
Wong, Kendy K., DeLeeuw, Ronald J., Dosanjh, Nirpjit S., Kimm, Lindsey R., Cheng, Ze, Horsman, Douglas E., ...
Segmental copy-number variations (CNVs) in the human genome are associated with developmental disorders and susceptibility to diseases. More importantly, CNVs may represent a major genetic component...
Effect of active smoking on the human bronchial epithelium transcriptome
Chari, Raj, Lonergan, Kim M, Ng, Raymond T, MacAulay, Calum, Lam, Wan L, Lam, Stephen
MD-SeeGH: a platform for integrative analysis of multi-dimensional genomic data
Chi, Bryan, DeLeeuw, Ronald J, Coe, Bradley P, Ng, Raymond T, MacAulay, Calum, Lam, Wan L
ItCompress: An Iterative Semantic Compression Algorithm
Jagadish Raymond Ng, H. V. Jagadish, Raymond T. Ng, Beng Chin, Ooi Anthony, K. H. Tung
Real datasets are often large enough to necessitate data compression. Traditional `syntactic' data compression methods treat the table as a large byte string and operate at the byte level. The...
Model-based clustering of array CGH data
Shah, Sohrab P., Cheung, K-John, Johnson, Nathalie A., Alain, Guillaume, Gascoyne, Randy D., Horsman, Douglas E., ...
Motivation: Analysis of array comparative genomic hybridization (aCGH) data for recurrent DNA copy number alterations from a cohort of patients can yield distinct sets of molecular signatures or...
Genomic imbalances in precancerous tissues signal oral cancer risk
Garnis, Cathie, Chari, Raj, Buys, Timon PH, Zhang, Lewei, Ng, Raymond T, Rosin, Miriam P, ...
Oral cancer develops through a series of histopathological stages: through mild (low grade), moderate, and severe (high grade) dysplasia to carcinoma in situ and then invasive disease. Early...