Jaewoo Kang

Publication List Details

Period

1996 - 2009

Number

47

Co-Authors

Adaptive BLASTing through the Sequence Dataspace: Theories on Protein Sequence Embedding (2009)

Hong, Yoojin, Kang, Jaewoo, Lee, Dongwon, Patterson, Randen L., Van Rossum, Damian B.

We theorize that phylogenetic profiles provide a quantitative method that can relate the structural and functional properties of proteins, as well as their evolutionary relationships. A key feature...

Record Linkage as DNA Sequence Alignment Problem (2009)

Yoojin Hong, Tao Yang, Jaewoo Kang, Dongwon Lee

Since modern database applications increasingly need to deal with dirty data due to a variety of reasons (e.g., data entry errors, heterogeneous formats, and ambiguous terms), considerable recent...

Identifying Value Mappings for Data Integration: An Unsupervised Approach (2008)

Jaewoo Kang, Dongwon Lee, Prasenjit Mitra

Abstract. The Web is a distributed network of information sources where the individual sources are autonomously created and maintained. Consequently, syntactic and semantic heterogeneity of data...

BIOINFORMATICS ORIGINAL PAPER Sequence analysis (2008)

Bin Song, Jeong-hyeon Choi, Guangyu Chen, Jacek Szymanski, Guo-qiang Zhang, ...

Vol. 22 no. 19 2006, pages 2326–2332 doi:10.1093/bioinformatics/btl398 ARCS: an aggregated related column scoring scheme for aligned sequences

Adaptive Framework for Multivariate Stream Data Processing in Data-Centric Sensor Applications (2008)

Sungbo Seo, Jaewoo Kang, Dongwon Lee, Keun Ho Ryu

Abstract. We introduce an adaptive framework for multivariate sensor stream data reduction. The proposed method takes as input a sliding window of multivariate stream data, classifies the data in...

Data Models for Exploratory Analysis of Heterogeneous Microarray Data (2008)

Jaewoo Kang

Microarrays are one of the latest breakthroughs in experimental molecular biology. It provides a powerful tool by which the expression patterns of thousands of

Microarray data mining using landmark gene-guided clustering (2008)

Chopra, Pankaj, Kang, Jaewoo, Yang, Jiong, Cho, HyungJun, Kim, Heenam, Lee, Min-Goo

Abstract Background Clustering is a popular data exploration technique widely used in microarray data analysis. Most conventional clustering algorithms, however, generate only one set of clusters...

Identifying Value Mappings for Data Integration: An Unsupervised Approach (2008)

Jaewoo Kang, Dongwon Lee, Prasenjit Mitra

Abstract. The Web is a distributed network of information sources where the individual sources are autonomously created and maintained. Consequently, syntactic and semantic heterogeneity of data...

Multivariate Stream Data Classification Using Simple Text Classifiers (2008)

Sungbo Seo, Jaewoo Kang, Dongwon Lee, Keun Ho Ryu

Abstract. We introduce a classification framework for continuous multivariate stream data. The proposed approach works in two steps. In the preprocessing step, it takes as input a sliding window of...

Microarray data mining using landmark gene-guided clustering. (2008)

Chopra, Pankaj, Kang, Jaewoo, Yang, Jiong, Cho, HyungJun, Kim, Heenam Stanley, Lee, Min-Goo

BACKGROUND: Clustering is a popular data exploration technique widely used in microarray data analysis. Most conventional clustering algorithms, however, generate only one set of clusters independent...

Robust Likelihood-Based Survival Modeling with Microarray Data (2008)

HyungJun Cho, Ami Yu, Sukwoo Kim, Jaewoo Kang, Seung-Mo Hong

Gene expression data can be associated with various clinical outcomes. In particular, these data can be of importance in discovering survival-associated genes for medical applications. As...

HICCUP: Hierarchical Clustering Based Value Imputation using Heterogeneous Gene Expression (2007)

Qiankun Zhao, Prasenjit Mitra, Dongwon Lee, Jaewoo Kang

A novel microarray value imputation method, HICCUP1, is presented. HICCUP improves upon existing value imputation methods in the several ways. (1) By judiciously integrating heterogeneous microarray...

Are Your Citations Clean? (2007)

Dongwon Lee, Jaewoo Kang, Prasenjit Mitra, C. Lee Giles, Byung-Won On

If the are, only one can refer to a distinct document; if not, many can refer to the same document.

ARCS: an aggregated related column scoring scheme for aligned sequences (2006)

Bin Song, Jeong-hyeon Choi, Guangyu Chen, Jacek Szymanski, Guo-qiang Zhang, K. H. Tung, ...

Motivation: Biologists frequently align multiple biological sequences to determine consensus sequences and/or search for predominant residues and conserved regions. Particularly, determining...

An effective approach to entity resolution problem using quasi-clique and its application to digital libraries (2006)

Byung-won On, Ergin Elmacioglu, Dongwon Lee, Jaewoo Kang, Jian Pei

We study how to resolve entities that contain a group of related elements in them (e.g., an author entity with a list of citations or an intermediate result by GROUP BY SQL query). Such entities,...

Improving grouped-entity resolution using quasi-cliques (2006)

Byung-won On, Ergin Elmacioglu, Dongwon Lee, Jaewoo Kang, Jian Pei

The entity resolution (ER) problem, which identifies duplicate entities that refer to the same real world entity, is essential in many applications. In this paper, in particular, we focus on...

Improving grouped-entity resolution using quasi-cliques (2006)

Byung-won On, Ergin Elmacioglu, Dongwon Lee, Jaewoo Kang, Jian Pei

The entity resolution (ER) problem, which identifies duplicate entities that refer to the same real world entity, is essential in many applications. In this paper, in particular, we focus on...

ARCS: an aggregated related column scoring scheme for aligned sequences (2006)

Song, Bin, Choi, Jeong-Hyeon, Chen, Guangyu, Szymanski, Jacek, Zhang, Guo-Qiang, Tung, Anthony K. H., ...

Motivation: Biologists frequently align multiple biological sequences to determine consensus sequences and/or search for predominant residues and conserved regions. Particularly, determining...

Effective and scalable solutions for mixed and split citation problems in digital libraries (2005)

Dongwon Lee, Byung-won On, Jaewoo Kang, Sanghyun Park

In this paper, we consider two important problems that commonly occur in bibliographic digital libraries, which seriously degrade their data qualities: Mixed Citation (MC) problem (i.e., citations of...

Establishing Value Mappings using Statistical Models and User Feedback (2005)

Jaewoo Kang, Tae Sik Han

In this paper, we present a “value mapping ” algorithm that does not rely on syntactic similarity or semantic interpretation of the values. The algorithm first constructs a statistical model...

Comparative Study of Name Disambiguation Problem using a Scalable Blocking-based Framework (2005)

Byung-won On, Dongwon Lee, Jaewoo Kang, Prasenjit Mitra

In this paper, we consider the problem of ambiguous author names in bibliographic citations, and comparatively study alternative approaches to identify and correct such name variants (e.g.,...

Establishing Value Mappings using Statistical Models and User Feedback (2005)

Jaewoo Kang, Tae Sik Han

A “value mapping ” algorithm that does not rely on syntactic similarity or semantic interpretation of the values is presented. The algorithm first constructs a statistical model (e.g.,...

Integrating heterogeneous microarray data sources using correlation signatures (2005)

Jaewoo Kang, Jiong Yang, Wanhong Xu, Pankaj Chopra

Abstract. Microarrays are one of the latest breakthroughs in experimental molecular biology. Thousands of different research groups generate tens of thousands of microarray gene expression profiles...

Abstract (2004)

Luo Gao, A Toolkit, Automated Fine-grained, Access Control Policy, Luo Gao, ...

Enforcement in Oracle 9i. (Under the direction of Dr. Ting Yu) Database access control is indispensable to information system security. As enterprises expand their services to the Internet, it has...

Toward the scalable integration of internet information sources / (2003)

Kang, Jaewoo.

Thesis (Ph. D.)--University of Wisconsin--Madison, 2003.

Toward the scalable integration of internet information sources / (2003)

Kang, Jaewoo.

Thesis (Ph. D.)--University of Wisconsin--Madison, 2003.

Toward the scalable integration of internet information sources / (2003)

Kang, Jaewoo.

Thesis (Ph. D.)--University of Wisconsin--Madison, 2003.

On Schema Matching with Opaque Column Names and Data Values (2003)

Jaewoo Kang, Jeffrey F. Naughton

Most previous solutions to the schema matching problem rely in some fashion upon identifying "similar " column names in the schemas to be matched, or by recognizing common domains...

On Schema Matching with Opaque Column Names and Data Values (2003)

Jaewoo Kang, Jeffrey F. Naughton

Most previous solutions to the schema matching problem rely in some fashion upon identifying "similar " column names in the schemas to be matched, or by recognizing common domains...

TOWARD THE SCALABLE INTEGRATION OF INTERNET INFORMATION SOURCES (2003)

Jaewoo Kang

This dissertation in a broad sense focuses on understanding the fundamental aspects of building a large-scale information integration system that can answer complex queries over a large number of...

The Niagara Internet Query System (2001)

Jeffrey Naughton, David Dewitt, David Maier, Ashraf Aboulnaga, Jianjun Chen, Leonidas Galanis, ...

Recently, there has been a great deal of research into XML query languages to enable the execution of database-style queries over XML files. However, merely being an XML query-processing engine does...

The Niagara Internet Query System (2001)

Jeffrey Naughton, David Dewitt, David Maier, Ashraf Aboulnaga, Jianjun Chen, Leonidas Galanis, ...

Recently, there has been a great deal of research into XML query languages to enable the execution of database-style queries over XML files. However, merely being an XML query-processing engine does...

The Niagara Internet Query System (2001)

Jeffrey Naughton, David Dewitt, David Maier, Ashraf Aboulnaga, Jianjun Chen, Leonidas Galanis, ...

Recently, there has been a great deal of research into XML query languages to enable the execution of database-style queries over XML files. However, merely being an XML query-processing engine does...

The Niagara Internet Query System (2001)

Jeffrey Naughton, David Dewitt, David Maier, Ashraf Aboulnaga, Jianjun Chen, Leonidas Galanis, ...

Recently, there has been a great deal of research into XML query languages to enable the execution of database-style queries over XML files. However, merely being an XML query-processing engine does...

IDB: Toward the Scalable Integration of Queryable Internet Data Sources (2000)

Jaewoo Kang, Mong Li, Lee Jeffrey, F. Naughton

As the number of databases accessible on the Web grows, the ability to execute queries spanning multiple heterogeneous queryable sources is becoming increasingly important. To date, research in this...

IDB: Unified Query Interface for Information on the Web 1 Proposal (1999)

Jaewoo Kang, Advisor Jeffrey Naughton

XML [4] is likely to become the primary vehicle of the information interchange on the Web. Organizations will publish and export their data in XML to facilitate inter- and intra-organization...

Overview of Strudel - A Web-Site Management System (1998)

Mary Fernandez, Daniela Florescu, Jaewoo Kang, Alon Levy, Dan Suciu

The Strudel system applies concepts from database management systems to the process of building Web sites. Strudel's key idea is separating the management of the site's data, the creation...

Catching the Boat with Strudel: Experiences with a Web-Site Management System (1998)

Mary Fernandez, Daniela Florescu, Jaewoo Kang, Alon Levy, Dan Suciu

Managing Web sites is a tedious data-management task, because today's Web sites often are derived from multiple data sources and have complex structures. To simplify these tasks, Web-site...

Overview of Strudel - A Web-Site Management System”. Networking and Information Systems 1(1 (1998)

Mary Fernandez, Daniela Florescu, Alon Levy, Dan Suciu, Jaewoo Kang

ABSTRACT. The Strudel system applies concepts from database management systems to the process of building Web sites. Strudel’s key idea is separating the management of the site’s data, the...

STRUDEL: A Web-site Management System (1997)

Mary Fernandez, Daniela Florescu, Jaewoo Kang, Alon Levy, Dan Suciu

Introduction The growth of the World-Wide Web has created a new kind of data management problem: building and maintaining Web sites. Building a Web site involves several tasks, such as choosing what...

STRUDEL: A Web-site Management System (1997)

Mary Fernandez, Daniela Florescu, Jaewoo Kang, Alon Levy, Dan Suciu

Introduction The growth of the World-Wide Web has created a new kind of data management problem: building and maintaining Web sites. Building a Web site involves several tasks, such as choosing what...

Catching the Boat with Strudel: Experiences with a Web-Site Management System

Mary Fernández, Daniela Florescu, Inria Roquencourt, Jaewoo Kang, Alon Levy, Dan Suciu

The Strudel system applies concepts from database management systems to the process of building Web sites. Strudel's key idea is separating the management of the site's data, the creation...

The Niagara Internet Query System

Jeffrey Naughton, David DeWitt, David Maier, Jianjun Chen, Leonidas Galanis, Kristin Tufte, ...

: Many projections envision a future in which the Internet is populated with a vast number of Web-accessible XML files---a "World-Wide Database". Recently, there has been a great deal of...

Empirical Bayes analysis of unreplicated microarray data

HyungJun Cho, Jaewoo Kang, Jae Lee

Microarray data, Empirical Bayes, Markov chain Monte Carlo, No replication,