Ben Kao

CHENG et al.: FILTERING DATA STREAMS FOR ENTITY-BASED CONTINUOUS QUERIES 1 Filtering Data Streams for Entity-based Continuous Queries (2009)

Reynold Cheng, Ben Kao, Alan Kwan, Sunil Prabhakar, Yicheng Tu

Abstract—The idea of allowing query users to relax their correctness requirements in order to improve performance of a data stream management system (e.g., location-based services and sensor...

Mining Order-Preserving Submatrices from Data with Repeated Measurements (2009)

Chun Kit Chui, Ben Kao

Order-preserving submatrices (OPSM’s) have been shown useful in capturing concurrent patterns in data when the relative magnitudes of data items are more important than their absolute values. To...

Chapter 19 An Overview of Real-Time Database Systems 1 (2008)

Ben Kao, Hector Garcia-molina

A real-time database system provides database features such as data independence and concurrency control, while at the same time enforcing real-time constraints that applications may have. In this...

ABSTRACT (2008)

W. K. Wong, David W. Cheung, Ben Kao

Outsourcing association rule mining to an outside service provider brings several important benefits to the data owner. These include (i) relief from the high mining cost, (ii) minimization of...

Reducing UK-means to K-means DUNE2007 2007-10-28 Traditional k-means Algorithm (2008)

S. D. Lee, Ben Kao, Reynold Cheng, S. D. Lee, Ben Kao, Reynold Cheng, ...

• Traditional clustering algorithms (e.g. k-means) only handles certain data. • In real world, uncertainty often arises in data: – due to random errors in physical measurements; – staleness...

ABSTRACT (2008)

W. K. Wong, David W. Cheung, Ben Kao

Outsourcing association rule mining to an outside service provider brings several important benefits to the data owner. These include (i) relief from the high mining cost, (ii) minimization of...

event (2008)

K. K. Loo, Ben Kao

time-delayed associations from discrete

A Framework for the Support of (2008)

Multilingual Computing Environments, Yip Chi Lap, Ben Kao, David Cheung

The issue of multiple natural language support in operating systems and application programs has appeared and reappeared under many different headings.

Modeling Transcription Factor Motifs (2008)

Ben Kao, Simon Kasif, Deyou Cai, Peter Nelson

Motivation: Generating small models for which there may exist very little training data presents a crucial problem in computational biology, namely the trade-off between model specificity and...

y (2007)

David W. Cheung, Bo Zhou, Ben Kao, Hu Kan, Sau Dan Lee

On-line Analytical Processing (OLAP) has become a very useful tool in decision support systems built on data warehouses. ROLAP (Relational OLAP) and MOLAP (Multidimensional OLAP) are two popular...

y (2007)

Brad Adelberg, Hector Garcia-molina, Ben Kao

Real-time scheduling algorithms are usually only available in the kernels of real-time operating systems, and not in more general purpose operating systems, like Unix. For some soft real-time...

y (2007)

David W. Cheung, Bo Zhou, Ben Kao, Hongjun Lu, Tak Wah Lam, Hing Fung Ting

On-line analytical processing (OLAP) requires ecient processing of complex decision support queries over very large databases. It is well accepted that pre-computed data cubes can help reduce the...

Modeling DNA sequences with Bayes networks (2007)

Deyou Cai, Arthur Delcher, Ben Kao, Simon Kasif

1 Introduction Recent advances in biotechnology have triggered the generation of massive amounts of biological data. The size and complexity of biological sequence databases suggest that automated...

IHKU CSIS Tech Report TR-2002-05J A GSP-based Efficient Algorithm for Mining Frequent Sequences (2007)

Minghua Zhang, Ben Kao, Chi-lap Yip, David Cheung

This paper studies the problem of mining frequent sequences in transactional databases. In [3], Agrawal and Srikant proposed the GSP algorithm for extracting frequently occurring sequences. GSP is an...

Algorithm for Incremental Update of Concept Spaces (2007)

Felix Cheung, Ben Kao, David Cheung, C. Y. Ng

Abstract. The vocabulary problem in information retrieval arises be-cause authors and indexers often use different terms for the same con-cept. A thesaurus defines mappings between different but...

Algorithms for Concept Space Construction (2007)

C. Y. Ng, Joseph Lee, Felix Cheung, Ben Kao, David Cheung

Abstract. The vocabulary problem in information retrieval arises because authors and indexers often use different terms for the same concept. A thesaurus defines mappings between different but...

A framework for the support of multilingual computing environments (2007)

Yip Chi Lap, Ben Kao, David Cheung

The issue of multiple natural language support in operating systems and application programs has appeared and reappeared under many different headings. "Internationalization",...

x (2007)

Ben Kao, K. Y. Lam, Brad Adelberg, Reynold Cheng, Tony Lee

A real-time database system contains base data items which record and model a physical, real world environment. For better decision support, base data items are summarized and correlated to derive...

z (2007)

David W. Cheung, Bo Zhou, Ben Kao, Kan Hu, Sau Dan Lee

ROLAP (Relational OLAP) and MOLAP (Multidimensional OLAP) are two opposing techniques for building On-line Analytical Processing (OLAP) systems. MOLAP has good query performance but suffers when the...

y (2007)

David W. Cheung, Bo Zhou, Ben Kao, Hongjun Lu, Tak Wah Lam, Hing Fung Ting

On-line analytical processing (OLAP) requires efficient processing of complex decision support queries over very large databases. It is well accepted that pre-computed data cubes can help reduce the...

x (2007)

Ben Kao, K. Y. Lam, Brad Adelberg, Reynold Cheng, Tony Lee

A database system contains base data items which record and model a physical, real world environment. For better decision support, base data items are summarized and correlated to derive views. These...

Uncertain data mining: An example in clustering location data (2006)

Michael Chau, Reynold Cheng, Ben Kao, Jackey Ng

Abstract. Data uncertainty is an inherent property in various applications due to reasons such as outdated sources or imprecise measurement. When data mining techniques are applied to these data,...

On mining micro-array data by order-preserving submatrix (2006)

Lin Cheung, Kevin Y. Yip, David W. Cheung, Ben Kao, Michael K. Ng

We study the problem of pattern-based subspace clustering. Unlike traditional clustering methods that focus on grouping objects with similar values on a set of dimensions, clustering by pattern...

Indexing multi-dimensional uncertain data with arbitrary probability density functions (2005)

Yufei Tao, Reynold Cheng, Xiaokui Xiao, Wang Kay Ngai, Ben Kao, Sunil Prabhakar

In an “uncertain database”, an object o is associated with a multi-dimensional probability density function (pdf), which describes the likelihood that o appears at each position in the data...

Indexing multi-dimensional uncertain data with arbitrary probability density functions (2005)

Yufei Tao, Reynold Cheng, Xiaokui Xiao, Wang Kay Ngai, Ben Kao, Sunil Prabhakar

In an “uncertain database”, an object o is associated with a multi-dimensional probability density function (pdf), which describes the likelihood that o appears at each position in the data...

Adaptive stream filters for entity-based queries with non-value tolerance (2005)

Reynold Cheng, Ben Kao, Sunil Prabhakar, Alan Kwan, Yicheng Tu

We study the problem of applying adaptive filters for approximate query processing in a distributed stream environment. We propose filter bound assignment protocols with the objective of reducing...

Online Algorithms for Mining Inter-Stream Associations from Large Sensor Networks (2005)

K. K. Loo, Ivy Tong, Ben Kao, David Cheung

We study the problem of mining frequent value sets from a large sensor network. We discuss how sensor stream data could be represented that facilitates e#cient online mining and propose the...

Adaptive Stream Filters for Entity-based Queries with Non-Value Tolerance (2005)

Reynold Cheng, Ben Kao, Sunil Prabhakar, Alan Kwan, Yicheng Tu

We study the problem of applying adaptive filters for approximate query processing in a distributed stream environment. We propose filter bound assignment protocols with the objective of reducing...

Indexing Multi-Dimensional Uncertain Data with Arbitrary Probability Density Functions (2005)

Yufei Tao, Reynold Cheng, Xiaokui Xiao, Wang Kay Ngai, Ben Kao, Sunil Prabhakar

In an "uncertain database", an object o is associated with a multi-dimensional probability density function (pdf), which describes the likelihood that o appears at each position in the data...

Mining periodic patterns with gap requirement from sequences (2005)

Minghua Zhang, Ben Kao, David W. Cheung, Kevin Y. Yip

We study a problem of mining frequently occurring periodic patterns with a gap requirement from sequences. Given a character sequence S of length L and a pattern P of length l, we consider P a...

Mining periodic patterns with gap requirement from sequences (2005)

Minghua Zhang, Ben Kao, David W. Cheung, Kevin Y. Yip

We study a problem of mining frequently occurring periodic patterns with a gap requirement from sequences. Given a character sequence S of length L and a pattern P of length l, weconsiderP a...

Adaptive stream filters for entity-based queries with non-value tolerance (2005)

Reynold Cheng, Ben Kao, Sunil Prabhakar, Alan Kwan, Yicheng Tu

We study the problem of applying adaptive filters for approximate query processing in a distributed stream environment. We propose filter bound assignment protocols with the objective of reducing...

Learning algorithms for large datasets / (2003)

Kao, Ben.

Thesis (PH. D. in Computer Science)--University of Illinois at Chicago, 2003.

Mining emerging substrings (2003)

Sarah Chan, Ben Kao, C. L. Yip, Michael Tang

Abstract. We introduce a new type of KDD patterns called emerging substrings. In a sequence database, an emerging substring (ES) of a data class is a substring which occurs more frequently in that...

Maintenance of Partial-Sum-Based Histograms (2003)

David W. Cheung, Ben Kao

This paper introduces an efficient method for the maintenance of wavelet-based histograms built on partial sums. Wavelet-based histograms can be constructed from either raw data distributions or...

SF-Tree: An Efficient and Flexible Structure for Selectivity Estimation (2003)

Wai-shing Ho, Ben Kao, David W. Cheung, Yip Chi Lap, Eric Lo

Estimating the selectivity of a simple path expression (SPE) is essential for selecting the most efficient evaluation plans for XML queries. To estimate selectivity, we need an efficient and flexible...

Efficient algorithms for incremental update of frequent sequences (2002)

Minghua Zhang, Ben Kao, David Cheung, Chi-lap Yip

Agrawal and Srikant first put forward the problem of mining frequently occurring se-quences from a customer database [1]. In their model, a customer database consists of a set of sequences. Each...

Evaluation of concurrency control strategies for mixed soft real-time database systems (2002)

Kam-yiu Lam, Tei-wei Kuo, Ben Kao, Reynold Cheng

Previous research in real-time concurrency control mainly focuses on the schedulability guarantee of hard real-time transactions and the reducing of the miss rate of soft real-time transactions....

Optimization in Data Cube System Design EDWARD HUNG (2000)

David W. Cheung, Ben Kao

Abstract. The design of an OLAP system for supporting real-time queries is one of the major research issues. One approach is to use data cubes, which are materialized precomputed multidimensional...

An Optimization Problem in Data Cube System Design (2000)

Edward Hung, David W. Cheung, Ben Kao, Y. L. Liang

. In an OLAP system, we can use data cubes (precomputed multidimensional views of data) to support real-time queries. To reduce the maintenance cost, which is related to the number of cubes...

Exploiting the Duality of Maximal Frequent Itemsets and Minimal Infrequent Itemsets for I/O Efficient Association Rule Mining (2000)

K. K. Loo, Chi-lap Yip, Ben Kao, David Cheung

Any algorithm for mining association rules must discover the set of all maximal frequent itemsets (maxL) from a database. Given a set of itemsets X , to verify that X is maxL, two conditions must be...

Modeling splice sites with Bayes networks (2000)

Cai, Deyou, Delcher, Arthur, Kao, Ben, Kasif, Simon

Motivation: The main goal in this paper is to develop accurate probabilistic models for important functional regions in DNA sequences (e.g. splice junctions that signal the beginning and end of...

Requirement-based data cube schema design (1999)

Cheung, David W., Zhou, Bo, Kao, Ben, Lu, Hongjun, Lam, Tak Wah, Ting, Hing Fung

On-line analytical processing (OLAP) requires efficient processing of complex decision support queries over very large databases. It is well accepted that pre-computed data cubes can help reduce the...

LGen - a lattice-based candidate set generation algorithm for I/O efficient association rule mining (1999)

Chi-lap Yip, K. K. Loo, Ben Kao, David Cheung, C. K. Cheng

Most algorithms for association rule mining are variants of the basic Apriori algorithm [1]. One characteristic of these Apriori-based algorithms is that candidate itemsets are generated in rounds,...

Requirement-Based Data Cube Schema Design (1999)

David Cheung, Bo Zhou, Ben Kao, Hongjun Lu, Tak Wah Lam, Hing Fung Ting

On-line analytical processing (OLAP) requires efficient processing of complex decision support queries over very large databases. It is well accepted that pre-computed data cubes can help reduce the...

DROLAP - A Dense-Region Based Approach to On-line Analytical Processing (1999)

David W. Cheung, Bo Zhou, Ben Kao, Kan Hu, Sau Dan Lee

. ROLAP (Relational OLAP) and MOLAP (Multidimensional OLAP) are two opposing techniques for building On-line Analytical Processing (OLAP) systems. MOLAP has good query performance while ROLAP is...

Requirement-Based Data Cube Schema Design (1999)

David Cheung Bo, David W. Cheung, Bo Zhou, Ben Kao, Hongjun Lu, Tak Wah, ...

On-line analytical processing (OLAP) requires efficient processing of complex decision support queries over very large databases. It is well accepted that pre-computed data cubes can help reduce the...

LGen --- A Lattice-Based Candidate Set Generation Algorithm (1999)

For Efficient Association, Chi-lap Yip, K. K. Loo, Ben Kao, David Cheung, C. K. Cheng

Most algorithms for association rule mining are variants of the basic Apriori algorithm [1]. One characteristic of these Apriori-based algorithms is that candidate itemsets are generated in rounds,...

A Study on Musical Features for Melody Databases (1999)

Yip Chi Lap, Ben Kao, Yip Chi, Lap Ben Kao

Music has an auditory and temporal nature. The same piece can be interpreted in multiple, and often unrelated ways. Together with the limitations of its representations, the design of content-based...

DROLAP --- A Dense-Region Based Approach to On-line (1999)

Analytical Processing David, David W. Cheung, Bo Zhou, Ben Kao, Kan Hu, Sau Dan Lee

ROLAP (Relational OLAP) and MOLAP (Multidimensional OLAP) are two opposing techniques for building On-line Analytical Processing (OLAP) systems. MOLAP has good query performance but suffers when the...

Is Sampling Useful in Data Mining? A Case in the Maintenance of Discovered Association Rules (1998)

S. D. Lee, David W. Cheung, Ben Kao, Hong Kong

By nature, sampling is an appealing technique for data mining, because approximate solutions in most cases may already be of great satisfaction to the need of the users. We attempt to use sampling...

Is Sampling Useful in Data Mining? A Case in the Maintenance of Discovered Association Rules (1998)

S.D. Lee, David W. Cheung, Ben Kao, Hong Kong

By nature, sampling is an appealing technique for data mining, because approximate solutions in most cases may already be of great satisfaction to the need of the users. We attempt to use sampling...

Discovering User Access Patterns on the World-Wide Web (1997)

David W. Cheung, Ben Kao, Joseph Lee

The World-Wide Web provides its users almost unlimited accesses to the documents on the Internet. We suggest to use intelligent agents to assist users to locate documents related to their interests...

Overview of the STanford Real-time Information Processor (STRIP) (1996)

Brad Adelberg, Ben Kao, Hector Garcia-molina

We believe that the greatest growth potential for soft realtime databases is not as isolated monolithic databases but as components in open systems consisting of many heterogeneous databases. In such...

An Overview of Real-Time Database Systems (1995)

Ben Kao, Hector Garcia-molina

A real-time database system provides database features such as data independence and concurrency control, while at the same time enforcing real-time constraints that applications may have. In this...

An Overview of Real-Time Database Systems (1995)

Ben Kao, Hector Garcia-molina

Traditionally, real-time systems manage their data (e.g. chamber temperature, aircraft locations) in application dependent structures. As real-time systems evolve, their applications become more...

Applying Update Streams in a Soft Real-Time Database System (1995)

Brad Adelberg, Hector Garcia-molina, Ben Kao

Many papers have examined how to efficiently export a materialized view but to our knowledge none have studied how to efficiently import one. To import a view, i.e., to install a stream of updates, a...

Database Support for Efficiently Maintaining Derived Data (1995)

Brad Adelberg, Ben Kao Hector, Ben Kao, Hector Garcia-molina

Derived data is maintained in a database system to correlate and summarize base data which record real world facts. As base data changes, derived data needs to be recomputed. A high performance...

On Building Distributed Soft Real-Time Systems (1995)

Ben Kao, Hector Garcia-molina, Brad Adelberg

When building a distributed real-time system, one can either build the whole system from scratch, or from pre-existing standard components. Although the former allows better scheduling design, it is...

An Overview of Real-Time Database Systems (1995)

Ben Kao, Hector Garcia-molina

Introduction Traditionally, real-time systems manage their data (e.g. chamber temperature, aircraft locations) in application dependent structures. As real-time systems evolve, their applications...

B1-Ro] [DKV] [Du-La] [Eil] [Ei2] [Gbl] [Gb-Ja] [Ha] [Ih] [Ja] [Ja-Lgl] [Knl] [Kn2] [Kol] [Labl (1994)

Michael Chau, Reynold Cheng, Ben Kao

Data uncertainty is often found in real-world applications due to reasons such as imprecise measurement, outdated sources, or sampling errors. Recently, much research has been published in the area...

Emulating Soft Real-Time Scheduling Using Traditional Operating System Schedulers (1994)

Brad Adelberg, Hector Garcia-molina, Ben Kao

Real-time scheduling algorithms are usually only available in the kernels of real-time operating systems, and not in more general purpose operating systems, like Unix. For some soft real-time...

Subtask Deadline Assignment for Complex Distributed Soft Real-Time Tasks (1993)

Ben Kao, Hector Garcia-molina

Complex distributed tasks often involve parallel execution of subtasks at different nodes. To meet the deadline of a global task, all of its parallel subtasks have to be finished on time. Comparing...

Deadline Assignment in a Distributed Soft Real-Time System (1993)

Ben Kao, Hector Garcia-molina

In a distributed environment, tasks often have processing demands on multiple different sites. A distributed task is usually divided up into several subtasks, each one to be executed at some site in...

Soft Real-Time Communication Over Dual Non-Real-Time Networks (Extended Abstract) (1992)

Ben Kao, Hector Garcia-molina

) Ben Kao Hector Garcia-Molina y November 23, 1992 Abstract In this paper we consider systems with redundant communication paths, and show how applications can exploit the redundancy to improve...