Surajit Chaudhuri

Ranking Objects Based on Relationships and Fixed Associations (2009)

Albert Angel, Surajit Chaudhuri, Nick Koudas, Gautam Das

Text corpora are often enhanced by additional metadata which relate real-world entities, with each document in which such entities are discussed. Such relationships are typically obtained through...

Scalable Ad-hoc Entity Extraction from Text Collections (2009)

Sanjay Agrawal, Kaushik Chakrabarti, Surajit Chaudhuri, Venkatesh Ganti

Supporting entity extraction from large document collections is important for enabling a variety of important data analysis tasks. In this paper, we introduce the “ad-hoc ” entity extraction task...

A Pay-As-You-Go Framework for Query Execution Feedback (2009)

Surajit Chaudhuri

Past work has suggested that query execution feedback can be useful in improving the quality of plans by correcting cardinality estimation errors in the query optimizer. The state-of-the-art approach...

Example-driven Design of Efficient Record Matching Queries (2008)

Surajit Chaudhuri, Bee-chung Chen, Venkatesh Ganti, Raghav Kaushik

Record matching is the task of identifying records that match the same real world entity. This is a problem of great significance for a variety of business intelligence applications. Implementations...

An Overview of Query Optimization in Relational Systems 1. OBJECTIVE (2008)

Surajit Chaudhuri

Them has been cxtensivc work in query optimization since the enrly ‘70s. It is hard to capture the breadth and depth of this large body of work in a short article. Therefore, I have decided to...

Theme New and Forgotten Dreams in Database Research (2008)

Surajit Chaudhuri, Versitaet Zuerich

In last year’s ICDE panel in New Orleans [l], we examined the question of whether database research is able to provide leadership to database industries. There was a consensus that with the...

Primitives for Workload Summarization and Implications for SQL (2008)

Surajit Chaudhuri, Prasanna Ganesan, Vivek Narasayya

Workload information has proved to be a crucial component for database-administration tasks as well as for analysis of query logs to understand user behavior and system usage. These tasks require the...

ABSTRACT Effective Use of Block-Level Sampling in Statistics Estimation (2008)

Surajit Chaudhuri

Block-level sampling is far more efficient than true uniform-random sampling over a large database, but prone to significant errors if used to create database statistics. In this paper, we develop...

ABSTRACT Effective Use of Block-Level Sampling in Statistics Estimation (2008)

Surajit Chaudhuri

Block-level sampling is far more efficient than true uniform-random sampling over a large database, but prone to significant errors if used to create database statistics. In this paper, we develop...

datar @ cs. stanford.edu (2008)

Surajit Chaudhuri, Gautam Das, Mayur Datar, Rajeev Motwani

We study the problem of approximately answering aggregation queries using sampling. We observe that uniform sampling performs poorly when the distribution of the aggregated attribute is skewed. To...

Part I: What Is It All About (2008)

Surajit Chaudhuri, Gerhard Weikum, Surajit Chaudhuri, Gerhard Weikum, Surajit Chaudhuri, ...

Motivate and enable students and young scientists to pursue research on the auto-tuning aspect of autonomic computing Complementary to • SIGMOD 02 and VLDB 02 tutorials (Shasha/Bonnet) on tuning...

Associate Editors (2008)

Eui-hong Han, George Karypis, Vipin Kumar, Bamshad Mobasher, Mining Charu, C. Aggarwal, ...

The Bulletin of the Technical Committee on Data Engineering is published quarterly and is distributed to all TC members. Its scope includes the design, implementation, modelling, theory and...

Self-tuning database systems: A decade of progress (2008)

Surajit Chaudhuri

In this paper we discuss advances in self-tuning database systems over the past decade, based on our experience in the AutoAdmin project at Microsoft Research. This paper primarily focuses on the...

and (2008)

Surajit Chaudhuri, Gerhard Weikum

We investigate the problem of ranking the answers to a database query when many tuples are returned. In particular, we present methodologies to tackle the problem for conjunctive and range queries,...

An Overview of Query Optimization in Relational Systems 1. OBJECTIVE (2008)

Surajit Chaudhuri

There has been extensive work in query optimization since the early ‘70s. It is hard to capture the breadth and depth of this large body of work in a short article. Therefore, I have decided to...

Abstract (2008)

Surajit Chaudhuri

A major bottleneck in implementing sampling as a primitive relational operation is the ine ciency of sampling the output of a query. It is not even known whether it is possible to generate a sample...

Associate Editors (2008)

David B. Lomet, Amr El Abbadi, Surajit Chaudhuri, Donald Kossmann, Elke Rundensteiner

The Bulletin of the Technical Committee on Data Engineering is published quarterly and is distributed to all TC members. Its scope includes the design, implementation, modelling, theory and...

and (2008)

Surajit Chaudhuri

The ability to approximately answer aggregation queries accurately and efficiently is of great benefit for decision support and data mining tools. In contrast to previous sampling-based studies, we...

Fine Grained Authorization Through Predicated Grants (2008)

Surajit Chaudhuri, Tanmoy Dutta, S. Sudarshan

Authorization in SQL is currently at the level of tables or columns. Many applications need a finer level of control. We propose a model for fine-grained authorization based on adding predicates to...

Report on the Second International Workshop on Self-Managing Database Systems (SMDB 2007) (2008)

Anastassia Ailamaki, Surajit Chaudhuri, Sam Lightstone, Guy Lohman, Pat Martin, Ken Salem, ...

Information management systems are growing rapidly in scale and complexity, while skilled database administrators

Bridging the Application and DBMS Profiling Divide for Database Application Developers (2008)

Surajit Chaudhuri

�� � ������� � �������tools for profiling and tuning application code remain disconnected from the profiling and tuning tools for relational DBMSs. This makes it...

Leveraging Aggregate Constraints For Deduplication (2008)

Surajit Chaudhuri, Anish Das Sarma, Venkatesh Ganti, Raghav Kaushik

We show that aggregate constraints (as opposed to pairwise constraints) that often arise when integrating multiple sources of data, can be leveraged to enhance the quality of deduplication. However,...

Heavy-Tailed Distributions and Multi-Keyword Queries ABSTRACT (2008)

Surajit Chaudhuri, Kenneth Church, Arnd Christian König, Liying Sui

Intersecting inverted indexes is a fundamental operation for many applications in information retrieval and databases. Efficient indexing for this operation is known to be a hard problem for...

Example-driven Design of Efficient Record Matching Queries (2008)

Surajit Chaudhuri, Bee-chung Chen, Venkatesh Ganti, Raghav Kaushik

Record matching is the task of identifying records that match the same real world entity. This is a problem of great significance for a variety of business intelligence applications. Implementations...

Abstract (2008)

Nicolas Bruno, Surajit Chaudhuri, Luis Gravano

In many applications, users specify target values for the attributes of a relation, and expect in return the � tuples that best match these values. Traditional RDBMSs do not process these “top-k...

Associate Editors (2007)

Gerhard Weikum, Arnd Christian, Achim Kraiss, Markus Sinnwell, Gary Valentin, Eric Christensen, ...

is published quarterly and is distributed to all TC members. Its scope includes the design, implementation, modelling, theory and application of database systems and their technology. Letters,...

Data Engineering (2007)

December Vol No, Answering Histograms, Viswanath Poosala, Venkatesh Ganti, Yannis E. Ioannidis, David B. Lomet, ...

Answering queries approximately has recently been proposed as a way to reduce query response times in on-line decision support systems, when the precise answer is not necessary or early feedback is...

Data Engineering (2007)

December Vol No, David B. Lomet, Amr El Abbadi, Amr El Abbadi, Surajit Chaudhuri, Donald Kossmann, ...

A quorum system is a collection of subsets of servers, every two of which intersect. Quorum systems have been suggested as a tool for concurrency control in replicated databases almost twenty years...

Data Engineering (2007)

March Vol No, Serge Abiteboul, Sophie Cluet, Tova Milo, Pini Mogilevsky, Jerome Siméon, ...

A broad spectrum of data is available on the Web in distinct heterogeneous sources, stored under different formats. As the number of systems that utilize this data grows, the importance of data...

Bulletin of the Technical Committee on (2007)

December Vol No, Answering Histograms, Viswanath Poosala, Venkatesh Ganti, Yannis E. Ioannidis, Dennis Shasha, ...

Answering queries approximately has recently been proposed as a way to reduce query response times in on-line decision support systems, when the precise answer is not necessary or early feedback is...

Data Engineering (2007)

December Vol No, Answering Histograms, Viswanath Poosala, Venkatesh Ganti, Yannis E. Ioannidis, David B. Lomet, ...

Answering queries approximately has recently been proposed as a way to reduce query response times in on-line decision support systems, when the precise answer is not necessary or early feedback is...

Data Engineering (2007)

June Vol No, Letter Special, Gerhard Weikum, Arnd Christian, Achim Kraiss, Markus Sinnwell, ...

Although today's computers provide huge amounts of main memory, the ever-increasing load of large data servers, imposed by resource-intensive decision-support queries and accesses to multimedia...

Data Engineering (2007)

December Vol, Letter Editor-in-chief, David Lomet, Michael A. Olson, Wei Michael Hong, Michael Ubell, ...

As network connectivity has continued its explosive growth and as storage devices have become smaller, faster, and less expensive, the number of online digitized images has increased rapidly....

gravano @ cs. columbia.edu over Multimedia (2007)

Surajit Chaudhuri, Luis Gravano, Amdlie Marian

amelie @ cs. columbia. edu Repositories of multimedia objects having multiple types of attributes (e.g., image, text) are be-coming increasingly common. A query on these attributes will typically...

Hewlett-Packard Laboratories (2007)

Surajit Chaudhuri, Luis Gravano

Repositories of multimedia objects having multiple types of attributes (e.g., image, text) are becoming increasingly common. A selection on these attributes will typically produce not just a set of...

SOAP (2007)

Surajit Chaudhuri, Surajit Chaudhuri, Surajit Chaudhuri, Surajit Chaudhuri, Surajit Chaudhuri, ...

Tutorial presented based solely on publicly available information Information is incomplete and could be inaccurate Our presentation reflects our understanding which may be erroneous Does not reflect...

SOAP (2007)

Surajit Chaudhuri, Surajit Chaudhuri, Surajit Chaudhuri, Surajit Chaudhuri, Surajit Chaudhuri, ...

Tutorial presented based solely on publicly available information Information is incomplete and could be inaccurate Our presentation reflects our understanding which may be erroneous Does not reflect...

About this Tutorial (2007)

Surajit Chaudhuri, Surajit Chaudhuri, Surajit Chaudhuri, Kyuseok Shim, Kyuseok Shim, Kyuseok Shim, ...

� Tutorial presented based solely on publicly available information � Information is incomplete and could be inaccurate � Our presentation reflects our understanding which may be erroneous �...

1 Overview 1.1 History The Microsoft Database Research Group (2007)

David Lomet, Roger Barga, Surajit Chaudhuri, Paul Larson, Vivek Narasayya

Microsoft’s strategic interest in the database field dates from 1993 and the efforts of David Vaskevitch, who is now the Microsoft Vice President in charge of the database and transaction...

1 (2007)

Surajit Chaudhuri, Kyuseok Shim

Abstract. Complex queries, with aggregates, views and nested subqueries are important in decision-support applications. Such queries are represented as multi-block queries where a query block may be...

Conference and Journal Notices (2007)

Letter Special, Gerhard Weikum, Arnd Christian, Achim Kraiss, Markus Sinnwell, Surajit Chaudhuri, ...

Letter from the Editor-in-Chief ICDE'2000 As many of you perhaps are aware, the deadline for the ICDE'2000 conference, which is the flagship conference of our technical committee, was June...

Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals (2007)

Gray, Jim, Chaudhuri, Surajit, Bosworth, Adam, Layman, Andrew, Reichart, Don, Venkatrao, Murali, ...

Data analysis applications typically aggregate data across many dimensions looking for anomalies or unusual patterns. The SQL aggregate functions and the GROUP BY operator produce zero-dimensional or...

Optimized stratified sampling for approximate query processing (2007)

Surajit Chaudhuri, Gautam Das, Vivek Narasayya

The ability to approximately answer aggregation queries accurately and efficiently is of great benefit for decision support and data mining tools. In contrast to previous sampling-based studies, we...

Stop-and-Restart Style Execution for Long Running Decision Support Queries (2007)

Surajit Chaudhuri, Raghav Kaushik, Abhijit Pol, Ravi Ramamurthy

Long running decision support queries can be resource intensive and often lead to resource contention in data warehousing systems. Today, the only real option available to the DBAs when faced with...

Report on the Second {International} {Workshop} on {Self-Managing} {Database} {Systems} (SMDB 2007) (2007)

Ailamaki, Anastassia, Chaudhuri, Surajit, Lightstone, Sam, Lohman, Guy M., Martin, Patrick, Salem, Kenneth, ...

Information management systems are growing rapidly in scale and complexity, while skilled database administrators are becoming rarer and more expensive. Increasingly, the total cost of ownership of...

Probabilistic information retrieval approach for ranking of database query results (2006)

Surajit Chaudhuri, Gerhard Weikum

We investigate the problem of ranking the answers to a database query when many tuples are returned. In particular, we present methodologies to tackle the problem for conjunctive and range queries,...

Generating Queries with Cardinality Constraints for DBMS Testing (2006)

Nicolas Bruno, Surajit Chaudhuri, Dilys Thomas

Abstract — Good testing coverage of novel database techniques, such as multidimensional histograms or changes in the execution engine, is a complex problem. In this work, we argue that this task...

Robust cardinality and cost estimation for skyline operator (2006)

Surajit Chaudhuri, Nilesh Dalvi, Raghav Kaushik

Incorporating the skyline operator inside the relational engine requires solving the cardinality estimation and the cost estimation problem, hitherto unaddressed. We propose robust techniques to...

AutoAdmin: Self-Tuning Database SystemsTechnology (2006)

Sanjay Agrawal, Nicolas Bruno, Surajit Chaudhuri, Vivek Narasayya

making database systems significantly more self-tuning. Initially, we focused on automating the physical design

Physical design refinement: The “Merge-Reduce” approach (2006)

Nicolas Bruno, Surajit Chaudhuri

Physical database design tools rely on a DBA-provided workload to pick an “optimal ” set of indexes and materialized views. Such tools allow either creating a new such configuration or adding new...

Foundations of Automated Database Tuning (Tutorial) (2006)

Chaudhuri, Surajit, Weikum, Gerhard, Liu, Ling, Reuter, Andreas, Whang, Kyu-Young, Zhang, Jianjun

Our society is more dependent on information systems than ever before. However, managing the information systems infrastructure in a cost-effective manner is a growing challenge. The total cost of...

Probabilistic information retrieval approach for ranking of database query results (2006)

Chaudhuri, Surajit, Das, Gautam, Hristidis, Vagelis, Weikum, Gerhard

We investigate the problem of ranking the answers to a database query when many tuples are returned. In particular, we present methodologies to tackle the problem for conjunctive and range queries,...

Integrating DB and IR technologies: What is the sound of one hand clapping (2005)

Surajit Chaudhuri, Raghu Ramakrishnan, Gerhard Weikum

Databases (DB) and information retrieval (IR) have evolved as separate fields. However, modern applications such as customer support, health care, and digital libraries require capabilities for both...

Robust Identification of Fuzzy Duplicates (2005)

Surajit Chaudhuri, Venkatesh Ganti, Rajeev Motwani

Detecting and eliminating fuzzy duplicates is a critical data cleaning task that is required by many applications. Fuzzy duplicates are multiple seemingly distinct tuples which represent the same...

Storing XML (with XSD) in SQL Databases: Interplay of Logical and Physical Designs (2005)

Surajit Chaudhuri, Zhiyuan Chen, Kyuseok Shim, Yuqing Wu

Much of business XML data has accompanying XSD specifications. In many scenarios, "shredding" such XML data into a relational storage is a popular paradigm. Optimizing evaluation of XPath...

When Can We Trust Progress Estimators For SQL Queries (2005)

Surajit Chaudhuri, Raghav Kaushik, Ravishankar Ramamurthy

The problem of estimating progress for long-running queries has recently been introduced. We analyze the characteristics of the progress estimation problem, from the perspective of providing robust,...

Foundations of Automated Database Tuning (2005)

Chaudhuri, Surajit, Weikum, Gerhard, Widom, Jennifer, Özcan, Fatma, Chrikova, Rada

The Challenge of Total Cost of-Ownership Our society is more dependent on information systems than ever before. However, managing the information systems infrastructure in a cost-effective manner is...

Integrating DB and IR Technologies: What is the Sound of One Hand Clapping? (2005)

Chaudhuri, Surajit, Ramakrishnan, Raghu, Weikum, Gerhard, Stonebraker, Michael, Weikum, Gerhard, DeWitt, David

Databases (DB) and information retrieval (IR)have evolved as separate fields. However, modern applications such as customer support, health care, and digital libraries require capabilities for both...

Probabilistic Ranking of Database Query Results (2004)

Chaudhuri,Surajit, Das,Gautam, Hristidis,Vagelis, Weikum,Gerhard

We investigate the problem of ranking answers to a database query when many tuples are returned. We adapt and apply principles of probabilistic models from Information Retrieval structured data. Our...

Probabilistic Ranking of Database Query Results (2004)

Chaudhuri, Surajit, Das, Gautam, Hristidis, Vagelis, Weikum, Gerhard, Nascimento, Mario A., Özsu, M. Tamer, ...

We investigate the problem of ranking answers to a database query when many tuples are returned. We adapt and apply principles of probabilistic models from Information Retrieval structured data. Our...

Selectivity estimation for string predicates: Overcoming the underestimation problem (2004)

Surajit Chaudhuri, Venkatesh Ganti, Luis Gravano

Queries with (equality or LIKE) selection predicates over string attributes are widely used in relational databases. However, state-of-the-art techniques for estimating selectivities of string...

Selectivity estimation for string predicates: Overcoming the underestimation problem (2004)

Surajit Chaudhuri, Venkatesh Ganti, Luis Gravano

Queries with (equality or LIKE) selection predicates over string attributes are widely used in relational databases. However, state-of-the-art techniques for estimating selectivities of string...

Storing XML (with XSD) in SQL Databases: Interplay of Logical and Physical Designs (2004)

Surajit Chaudhuri, Zhiyuan Chen, Kyuseok Shim, Yuqing Wu

this paper, we examine the interplay of logical and physical design, and experimentally demonstrate that: (1) solving the logical mapping and the physical design problem independently leads to a...

Optimizing top-k selection queries over multimedia repositories (2004)

Surajit Chaudhuri, Ieee Computer Society, Luis Gravano, Amélie Marian

Abstract—Repositories of multimedia objects having multiple types of attributes (e.g., image, text) are becoming increasingly common. A query on these attributes will typically request not just a...

Automatic categorization of query results (2004)

Kaushik Chakrabarti, Surajit Chaudhuri, Seung-won Hwang

Exploratory ad-hoc queries could return too many answers – a phenomenon commonly referred to as “information overload”. In this paper, we propose to automatically categorize the results of SQL...

SQLCM: A Continuous Monitoring Framework for Relational Database Engines (2004)

Surajit Chaudhuri, Arnd Christian, König Vivek Narasayya

The ability to monitor a database server is crucial for effective database administration. Today’s commercial database systems support two basic mechanisms for monitoring: (a) obtaining a snapshot...

Database tuning advisor for Microsoft SQL Server 2005 (2004)

Sanjay Agrawal, Surajit Chaudhuri, Lubor Kollar, Arun Marathe, Vivek Narasayya, Manoj Syamala

The Database Tuning Advisor (DTA) that is part of Microsoft SQL Server 2005 is an automated physical database design tool that significantly advances the state-of-the-art in several ways. First, DTA...

Probabilistic Ranking of Database Query Results (2004)

Chaudhuri, Surajit, Das, Gautam, Hristidis, Vagelis, Weikum, Gerhard, Nascimento, Mario A., Özsu, M. Tamer, ...

We investigate the problem of ranking answers to a database query when many tuples are returned. We adapt and apply principles of probabilistic models from Information Retrieval structured data. Our...

On relational support for XML publishing: Beyond sorting and tagging (2003)

Surajit Chaudhuri

In this paper, we study whether the need for efficient XML publishing brings any new requirements for relational query engines, or if sorting query results in the relational engine and tagging them...

Automated ranking of database query results (2003)

Surajit Chaudhuri, Gautam Das

We investigate the problem of ranking answers to a database query when many tuples are returned. We adapt and apply principles of probabilistic models from Information Retrieval for structured data....

On Relational Support for XML Publishing: Beyond Sorting and Tagging (2003)

Surajit Chaudhuri, Raghav Kaushik, Jeffrey F. Naughton

In this paper, we study whether the need for efficient XML publishing brings any new requirements for relational query engines, or if sorting query results in the relational engine and tagging them...

Automated ranking of database query results (2003)

Sanjay Agrawal, Surajit Chaudhuri

Ranking and returning the most relevant results of a query is a popular paradigm in Information Retrieval. We discuss challenges and investigate several approaches to enable ranking in databases,...

Automated ranking of database query results (2003)

Surajit Chaudhuri, Gautam Das

We investigate the problem of ranking answers to a database query when many tuples are returned. We adapt and apply principles of probabilistic models from Information Retrieval for structured data....

Robust and efficient fuzzy match for online data cleaning (2003)

Surajit Chaudhuri, Kris Ganjam, Venkatesh Ganti, Rajeev Motwani

To ensure high data quality, data warehouses must validate and cleanse incoming data tuples from external sources. In many situations, clean tuples must match acceptable tuples in reference tables....

Automating Layout of Relational Databases (2003)

Sanjay Agrawal, Surajit Chaudhuri, Abhinandan Das, Vivek Narasayya

The choice of database layout, i.e., how database objects such as tables and indexes are assigned to disk drives can significantly impact the I/O performance of the system. Today, DBAs typically rely...

Optimizing Top-K Selection Queries over Multimedia Repositories (2002)

Chaudhuri, Surajit, Gravano, Luis, Marian, Amelie

Repositories of multimedia objects having multiple types of attributes (e.g., image, text) are becoming increasingly common. A query on these attributes will typically request not just a set of...

Efficient evaluation of queries with mining predicates (2002)

Surajit Chaudhuri

Modern relational database systems are beginning to support ad hoc queries on mining models. In this paper, we explore novel techniques for optimizing queries that apply mining models to relational...

Abstract (2002)

Surajit Chaudhuri, Luis Gravano, Amélie Marian

Repositories of multimedia objects having multiple types of attributes (e.g., image, text) are be-coming increasingly common. A query on these attributes will typically request not just a set of...

DBXplorer: enabling keyword search over relational databases (2002)

San Jay Agrawal, Surajit Chaudhuri, Gautam Das

While relational database systems offer powerfifl structured query languages such as SQL, there is no support for keyword search over databases. The simplicity of keyword search as a querying...

Eliminating fuzzy duplicates in data warehouses (2002)

Rohit Ananthakrishna, Surajit Chaudhuri, Venkatesh Ganti

The duplicate elimination problem of detecting multiple tuples, which describe the same real world entity, is an important data cleaning problem. Previous domain independent solutions to this problem...

Overcoming Limitations of Sampling for Aggregation Queries (2001)

Surajit Chaudhuri, Gautam Das, Rajeev Motwani

We study the problem of approximately answering aggregation queries using sampling. We observe that uniform sampling performs poorly when the distribution of the aggregated attribute is skewed. To...

A robust, optimization-based approach for approximate answering of aggregate queries (2001)

Surajit Chaudhuri, Gautam Das, Vivek Narasayya

The ability to approximately answer aggregation queries accurately and efficiently is of great benefit for decision support and data mining tools. In contrast to previous sampling-based studies, we...

Overcoming Limitations of Sampling for Aggregation Queries (2001)

Surajit Chaudhuri, Gautam Das, Mayur Datar, Rajeev Motwani, Vivek Narasayya

We study the problem of approximately answering aggregation queries using sampling. We observe that uniform sampling performs poorly when the distribution of the aggregated attribute is skewed. To...

STHoles: a multidimensional workload-aware histogram (2001)

Nicolas Bruno, Surajit Chaudhuri, Luis Gravano

Attributes of a relation are not typically independent. Multidimensional histograms can be an effective tool for accurate multiattribute query selectivity estimation. In this paper, we introduce...

SOAP (2001)

Surajit Chaudhuri, Kyuseok Shim, Surajit Chaudhuri, Kyuseok Shim, Surajit Chaudhuri, Kyuseok Shim, ...

� Tutorial presented based solely on publicly available information � Information is incomplete and could be inaccurate � Our presentation reflects our understanding which may be erroneous �...

Integrating data mining with SQL databases: OLE DB for data mining (2001)

Amir Netz, Surajit Chaudhuri, Usama Fayyad, Jeff Bernhardt

The integration of data mining with traditional database systems is key to making it convenient, easy to deploy in real applications, and to growing its user base. In this paper we describe the new...

Performance of Multiattribute Top-K Queries on Relational Systems (2000)

Bruno, Nicolas, Chaudhuri, Surajit, Gravano, Luis

In many applications, users specify target valuesfor the attributes of a relation, and expect in return the k tuplesthat best match these values. Traditional RDBMSs do not process these``top-k...

Performance of Multiattribute Top-k queries on Relational Systems (2000)

Nicolas Bruno, Surajit Chaudhuri, Luis Gravano

In many applications, users specify target values for the attributes of a relation, and expect in return the k tuples that best match these values. Traditional RDBMSs do not process these...

Rethinking database system architecture: Towards a self-tuning RISC-style database system (2000)

Surajit Chaudhuri, Gerhard Weikum

Database technology is one of the cornerstones for the new millennium’s IT landscape. However, database systems as a unit of code packaging and deployment are at a crossroad: commercial systems...

Rethinking Database System Architecture: Towards a Self-tuning RISC-style Database System (2000)

Surajit Chaudhuri, Gerhard Weikum

Database technology is one of the cornerstones for the new millennium's IT landscape. However, database systems as a unit of code packaging and deployment are at a crossroad: commercial systems...

Integration of Data Mining and Relational Databases (2000)

Amir Netz Surajit, Surajit Chaudhuri, Jeff Bernhardt, Usama Fayyad

In this paper, we review the past work and discuss the future of integration of data mining and relational database systems. We also discuss support for integration in Microsoft SQL Server 2000.

Automated Selection of Materialized Views and Indexes for (2000)

Sql Databases Sanjay, Sanjay Agrawal, Surajit Chaudhuri, Vivek Narasayya

Automatically selecting an appropriate set of materialized views and indexes for SQL databases is a non-trivial task. A judicious choice must be cost-driven and influenced by the workload experienced...

Evaluating Top-k Selection Queries (1999)

Surajit Chaudhuri

In many applications, users specify target values for certain attributes, without requiring exact matches to these values in return. Instead, the result to such queries is typically a rank of the...

editors (1999)

William Dumouchel, Christos Faloutsos, Peter J. Haas, Joseph M. Hellerstein, Yannis Ioannidis, H. V. Jagadish, ...

The Bulletin of the Technical Committee on Data Engineering is published quarterly and is distributed to all TC members. Its scope includes the design, implementation, modelling, theory and...

Accurate Query Optimization by Sub-plan Memoization (1999)

Ashraf Aboulnaga, Surajit Chaudhuri

Query optimizers use approximate techniques such as histograms or sampling for result size and distinct value estimation, even though these techniques may incur high estimation errors, leading the...

On sampling and relational operators (1999)

Surajit Chaudhuri, Rajeev Motwani

A major bottleneck in implementing sampling as a primitive relational operation is the inefficiency of sampling the output of a query. We highlight the primary difficulties, summarize the results of...

Evaluating Top-k Selection Queries (1999)

Surajit Chaudhuri, Luis Gravano

In many applications, users specify target values for certain attributes, without requiring exact matches to these values in return. Instead, the result to such queries is typically a rank of the...

Self-tuning Histograms: Building Histograms Without Looking at Data (1999)

Ashraf Aboulnaga, Surajit Chaudhuri

In this paper, we introduce self-tuning histograms. Although similar in structure to traditional histograms, these histograms infer data distributions not by examining the data or a sample thereof,...

Data Engineering (1999)

December Vol, Christos Faloutsos, Peter J. Haas, Joseph M. Hellerstein, Yannis Ioannidis, H. V. Jagadish, ...

this paper we describe and evaluate several popular techniques for data reduction. Historically, the primary need for data reduction has been internal to a database system, in a cost-based query...

Evaluating Top-k Selection Queries (1999)

Surajit Chaudhuri

In many applications, users specify target values for certain attributes, without requiring exact matches to these values in return. Instead, the result to such queries is typically a rank of the...

Managing Objects in a Relational Framework, (1998)

Wiederhold, Gio, Barsalou, Thierry, Chaudhuri, Surajit

The papers collected in this report present ongoing research within the KBMS and PENGUIN projects at Stanford's Computer Science Department and Section on Medical Informatics. They have been issued...

A Mediator Architecture for Abstract Data Access. (1998)

Wiederhold, Gio, Risch, Tore, Rathmann, Peter, DeMichiel, Linda, Chaudhuri, Surajit

This report contains some concept papers describing the general architecture that we envisage to be appropriate for further information systems, as well as a number of papers with specific research...

editors (1998)

Michael J. Carey, Laura M. Haas, James Kleewein, Berthold Reinwald, Steve Olson, Richard Pledereder, ...

The Bulletin of the Technical Committee on Data Engineering is published quarterly and is distributed to all TC members. Its scope includes the design, implementation, modelling, theory and...

AutoAdmin ’what-if’ index analysis utility (1998)

Surajit Chaudhuri, Vivek Narasayya

surajitca microsoftcorn viveknar @ microsoft.com As databases get widely deployed, it becomes increasingly important to reduce the overhead of database administration. An important aspect of data...

An overview of query optimization in relational systems (1998)

Surajit Chaudhuri

There has been extensive work in query optimization since the early ‘70s. It is hard to capture the breadth and depth of this large body of work in a short article. Therefore, I have decided to...

On the Efficient Gathering of Sufficient Statistics for Classification from Large SQL Databases (1998)

Goetz Graefe Usama, Usama Fayyad, Surajit Chaudhuri

For a wide variety of classification algorithms, scalability to large databases can be achieved by observing that most algorithms are driven by a set of sufficient statistics that are significantly...

On the Efficient Gathering of Sufficient Statistics for Classification from Large SQL Databases (1998)

Goetz Graefe, Goetz Graefe, Usama Fayyad, Usama Fayyad, Surajit Chaudhuiri, Surajit Chaudhuri

For a wide variety of classification algorithms, scalability to large databases can be achieved by observing that most algorithms are driven by a set of sufficient statistics that are significantly...

Random Sampling for Histogram Construction: How much is enough? (1998)

Surajit Chaudhuri, Rajeev Motwani, Vivek Narasayya

Random sampling is a standard technique for constructing (approximate) histograms for query optimization. However, any real implementation in commercial products requires solving the hard problem of...

An Overview of Query Optimization in Relational Systems (1998)

Surajit Chaudhuri

this article is not to be comprehensive, but rather to explain the foundations and present samplings of significant work in this area. I would like to apologize to the many contributors in this area...

Data Mining and Database Systems: Where is the Intersection? (1998)

Surajit Chaudhuri

this paper). This raises the question as to what role, if any, database systems research may contribute to area of data mining. In this article, I will try to present my biased view on this issue and...

An overview of query optimization in relational systems (1998)

Surajit Chaudhuri

There has been extensive work in query optimization since the early ‘70s. It is hard to capture the breadth and depth of this large body of work in a short article. Therefore, I have decided to...

AutoAdmin ’what-if’ index analysis utility (1998)

Surajit Chaudhuri

As databases get widely deployed, it becomes increasingly important to reduce the overhead of database administration. An important aspect of data administration that critically influences...

An efficient cost-driven index selection tool for Microsoft SQL Server (1997)

Surajit Chaudhuri, Vivek Narasayya

In this paper we describe novel techniques that make it possible to build an industrial-strength tool for automating the choice of indexes in the physical design of a SQL database. The tool takes as...

An Overview of Data Warehousing and OLAP Technology (1997)

Surajit Chaudhuri, Umeshwar Dayal

Data warehousing and on-line analytical processing (OLAP) are essential elements of decision support, which has increasingly become a focus of the database industry. Many commercial products and...

An Efficient, Cost-Driven Index Selection Tool for Microsoft SQL Server (1997)

Surajit Chaudhuri, Vivek Narasayya

In this paper we describe novel techniques that make it possible to build an industrial-strength tool for automating the choice of indexes in the physical design of a SQL database. The tool takes as...

Optimization of Queries with User-defined Predicates (1997)

Surajit Chaudhuri, Kyuseok Shim

Relational databases provide the ability to store user-defined functions and predicates which can be invoked in SQL queries. When evaluation of a user-defined predicate is relatively expensive, the...

An Efficient, Cost-Driven Index Selection Tool for Microsoft SQL Server (1997)

Surajit Chaudhuri, Vivek Narasayya

In this paper we describe novel techniques that make it possible to build an industrial-strength tool for automating the choice of indexes in the physical design of a SQL database. The tool takes as...

An overview of data warehousing and OLAP technology (1997)

Surajit Chaudhuri, Umeshwar Dayal

Data warehousing and on-line analytical processing (OLAP) are essential elements of decision support, which has increasingly become a focus of the database industry. Many commercial products and...

An overview of data warehousing and OLAP technology (1997)

Surajit Chaudhuri, Umeshwar Dayal

Data warehousing and on-line analytical processing (OLAP) are essential elements of decision support, which has increasingly become a focus of the database industry. Many commercial products and...

An efficient cost-driven index selection tool for Microsoft SQL Server (1997)

Surajit Chaudhuri, Vivek Nqasayya

In this paper we describe novel techniques that make it possible to build an industrial-strength tool for automating the choice of indexes in the physical design of a SQL database. The tool takes as...

Optimizing queries over multimedia repositories (1996)

Surajit Chaudhuri, Luis Gravano

Multimedia repositories and applications that retrieve multimedia information are becoming increasingly popular. In this paper, we study the problem of selecting objects from multimedia repositories,...

Optimization of Queries with User-defined Predicates (1996)

Surajit Chaudhuri, Kyuseok Shim

Relational databases provide the ability to store user-defined functions and predicates which can be invoked in SQL queries. When evaluation of a user-defined predicate is relatively expensive, the...

Optimization of Queries with User-defined Predicates (1996)

Surajit Chaudhuri, Kyuseok Shim

ing with credit is permitted. To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any component of this work in other works, requires prior specific permission...

Optimization of Queries with User-defined Predicates (1996)

Surajit Chaudhuri Microsoft, Surajit Chaudhuri, Kyuseok Shim

Relational databases provide the ability to store user-defined functions and predicates which can be invoked in SQL queries. When evaluation of a user-defined predicate is relatively expensive, the...

Optimization of Queries with User-defined Predicates (1996)

Surajit Chaudhuri, Kyuseok Shim

Relational databases provide the ability to store user-defined functions and predicates which can be invoked in SQL queries. When evaluation of a user-defined predicate is relatively expensive, the...

Optimizing Queries over Multimedia Repositories (1996)

Surajit Chaudhuri, Luis Gravano

Repositories of multimedia objects having multiple types of attributes (e.g., image, text) are becoming increasingly common. A selection on these attributes will typically produce not just a set of...

Optimization of Queries with User-Defined Predicates (1996)

Surajit Chaudhuri

Relational databases provide the ability to store user-defined functions and predicates which can be invoked in SQL queries. When evaluation of a user-defined predicate is rel-atively expensive, the...

On Scheduling Atomic and Composite Multimedia Objects (1995)

Cyrus Shahabi, Shahram Gh, Surajit Chaudhuri

In multi-user multimedia information systems (e.g., video-on-demand, news-editing) the policy employed to activate queued requests has a significant impact on the average startup latency observed by...

Optimizing Queries with Materialized Views (1995)

Surajit Chaudhuri, Ravi Krishnamurthy, Spyros Potamianos, Kyuseok Shim

While much work has addressed the problem of maintaining materialized views, the important question of optimizing queries in the presence of materialized views has not been resolved. In this paper,...

An Overview of Cost-based Optimization of Queries with Aggregates (1995)

Surajit Chaudhuri, Kyuseok Shim

this paper, we will show that there is a rich set of execution alternatives that can significantly enhance the quality of the plans produced. We also discuss how one can choose among the...

Optimizing Queries With Materialized Views (1995)

Surajit Chaudhuri, Ravi Krishnamurthy, Spyros Potamianos, Kyuseok Shim

While much work has addressed the problem of maintaining materialized views, the important problem of optimizing queries in the presence of materialized views has not been resolved. In this paper, we...

Avoiding Retrieval Contention for Composite Multimedia Objects (1995)

Surajit Chaudhuri, Shahram Gh, Cyrus Shahabi

An important requirement for multimedia presentations is the ability to compose new multimedia objects from the existing ones using temporal relationships. When compositions of continuous media...

On Scheduling Atomic and Composite Multimedia Objects (1995)

Cyrus Shahabi, Shahram Ghandeharizadeh, Shahram Gh, Surajit Chaudhuri

In multi-user multimedia information systems (e.g., movie-on-demand, digital-editing), scheduling the retrievals of continuous media objects becomes a challenging task. This is because of both intra...

Join Queries with External Text Sources: Execution and Optimization Techniques (1995)

Surajit Chaudhuri, Umeshwar Dayal, Tak W. Yan

Text is a pervasive information type, and many applications require querying over text sources in addition to structured data. This paper studies the problem of query processing in a system that...

Including Group-By in Query Optimization (1994)

Surajit Chaudhuri, Kyuseok Shim

In existing relational database systems, processing of group-by and computation of aggregate functions are always postponed until all joins are performed. In this paper, we present transformations...

Query Optimization in the Presence of Foreign Functions (1993)

Surajit Chaudhuri, Kyuseok Shim

The declarativeness of relational query languages is very attractive for developing applications. However, many applications also need to invoke external functions or to access data that is not...

An online editor (1967)

Querying Xml, Data Alin Deutsch, Mary Fern, Daniela Florescu, Alon Levy, David Maier, ...

is published quarterly and is distributed to all TC members. Its scope includes the design, implementation, modelling, theory and application of database systems and their technology. Letters,...

AutoAdmin "What-if" Index Analysis Utility

Surajit Chaudhuri, Vivek Narasayya

As databases get widely deployed, it becomes increasingly important to reduce the overhead of database administration. An important aspect of data administration that critically influences...