Sudipto Guha

Learning to Create Data-Integrating Queries (2009)

Partha Pratim, Talukdar Marie, Jacob Muhammad, Salman Mehmood, O Pereira, Sudipto Guha

The number of potentially-related data resources available for querying — databases, data warehouses, virtual integrated schemas — continues to grow rapidly. Perhaps no area has seen this problem...

Learning to Create Data-Integrating Queries (2009)

Partha Pratim, Talukdar Marie, Jacob Muhammad, Salman Mehmood, O Pereira, Sudipto Guha

The number of potentially-related data resources available for querying — databases, data warehouses, virtual integrated schemas — continues to grow rapidly. Perhaps no area has seen this problem...

How to Probe for an Extreme Value ∗ (2009)

Ashish Goel, Sudipto Guha, Kamesh Munagala

In several systems applications, parameters such as load are known only with some associated uncertainty, which is specified, or modeled, as a distribution over values. The performance of the system...

A Substrate for In-Network Sensor Data Integration (2009)

Svilen R. Mihaylov, Marie Jacob, Zachary G. Ives, Sudipto Guha

With the ultimate goal of extending the data integration paradigm and query processing capabilities to ad hoc wireless networks, sensors, and stream systems, we consider how to support communication...

STREAM ORDER AND ORDER STATISTICS: QUANTILE ESTIMATION IN RANDOM-ORDER STREAMS ∗ (2009)

Sudipto Guha, Andrew Mcgregor

Abstract. When trying to process a data-stream in small space, how important is the order in which the data arrives? Are there problems that are unsolvable when the ordering is worst-case, that can...

Learning to Create Data-Integrating Queries (2009)

Partha Pratim, Talukdar Marie, Jacob Muhammad, Salman Mehmood, O Pereira, Sudipto Guha

The number of potentially-related data resources available for querying — databases, data warehouses, virtual integrated schemas — continues to grow rapidly. Perhaps no area has seen this problem...

Graph Sparsification in the Semi-Streaming Model (2009)

Ahn, KookJin, Guha, Sudipto

Analyzing massive data sets has been one of the key motivations for studying streaming algorithms. In recent years, there has been significant progress in analysing distributions in a streaming...

Revisiting the Direct Sum Theorem and Space Lower Bounds in Random Order Streams (2009)

Guha, Sudipto, Huang, Zhiyi

Estimating frequency moments and $L_p$ distances are well studied problems in the adversarial data stream model and tight space bounds are known for these two problems. There has been growing...

A Constant Factor Approximation for the Single Sink Edge Installation Problem (2009)

Guha, Sudipto, Meyerson, Adam, Munagala, Kamesh

We present the first constant approximation to the single sink buy-at-bulk network design problem, where we have to design a network by buying pipes of different costs and capacities per unit length...

Multi-armed Bandits with Limited Exploration ∗ (2009)

Sudipto Guha, Kamesh Munagala

A central problem to decision making under uncertainty is the trade-off between exploration and exploitation: between learning from and adapting to a stochastic system and exploiting the current...

Sparsification Algorithm for Cut Problems on Semi-streaming Model (2009)

Ahn, Kook Jin, Guha, Sudipto

The emergence of social networks and other interaction networks have brought to fore the questions of processing massive graphs. The (semi) streaming model, where we assume that the space is (near)...

Stream Order and Order Statistics: Quantile Estimation in Random-Order Streams (2009)

Guha, Sudipto, Mcgregor, Andrew

When trying to process a data stream in small space, how important is the order in which the data arrive? Are there problems that are unsolvable when the ordering is worst case, but that can be...

Tight results for clustering and summarizing data streams (2009)

Guha, Sudipto

In this paper we investigate algorithms and lower bounds for summarization problems over a single pass data stream. In particular we focus on histogram construction and K-center clustering. We...

ABSTRACT Wavelet Synopsis for Data Streams: Minimizing Non-Euclidean Error (2008)

Sudipto Guha

We consider the wavelet synopsis construction problem for data streams where given n numbers we wish to estimate the data by constructing a synopsis, whose size, say B is much smaller than n. The B...

On the space–time of optimal, approximate and streaming algorithms for synopsis construction problems (2008)

Sudipto Guha

Synopses construction algorithms have been found to be of interest in query optimization, approximate query answering and mining, and over the last few years several good synopsis construction...

Adaptive Uncertainty Resolution in Bayesian Combinatorial Optimization Problems (2008)

Guha, Sudipto, Munagala, Kamesh

In several applications such as databases, planning, and sensor networks, parameters such as selectivity, load, or sensed values are known only with some associated uncertainty. The performance of...

ABSTRACT Approximate XML Joins (2008)

Sudipto Guha, Divesh Srivastava

XML is widely recognized as the data interchange standard for tomorrow, because of its ability to represent data from a wide variety of sources. Hence, XML is likely to be the format through which...

Approximation Algorithms for Partial-information based Stochastic Control with Markovian Rewards (2008)

Sudipto Guha

We consider a variant of the classic multi-armed bandit problem (MAB), which we call FEEDBACK MAB, where the reward obtained by playing each of n independent arms varies according to an underlying...

A Substrate for In-Network Sensor Data Integration (2008)

Mihaylov, Svilen R, Jacob, Marie, Ives, Zachary G, Guha, Sudipto

With the ultimate goal of extending the data integration paradigm and query processing capabilities to ad hoc wireless networks, sensors, and stream systems, we consider how to support communication...

A Substrate for In-Network Sensor Data Integration (2008)

Mihaylov, Svilen, Jacob, Marie, Ives, Zachary G, Guha, Sudipto

With the ultimate goal of extending the data integration paradigm and query processing capabilities to ad hoc wireless networks, sensors, and stream systems, we consider how to support communication...

Sequential Design of Experiments via Linear Programming (2008)

Guha, Sudipto, Munagala, Kamesh

The celebrated multi-armed bandit problem in decision theory models the basic trade-off between exploration, or learning about the state of a system, and exploitation, or utilizing the system. In...

Information Acquisition and Exploitation in Multichannel Wireless Networks (2008)

Guha, Sudipto, Munagala, Kamesh, Sarkar, Saswati

A wireless system with multiple channels is considered, where each channel has several transmission states. A user learns about the instantaneous state of an available channel by transmitting a...

Abstract (2008)

Sudipto Guha, Rajeev Rastogi, Kyuseok Shim

Clustering, in data mining, is useful to discover distribution patterns in the underlying data. Clustering algorithms usually employ a distance metric based (e.g., euclidean) similarity measure in...

General Terms: Algorithms,Performance,Theory (2008)

Sudipto Guha

We consider a wireless system with multiple channels where each channel is either on or off, and probing the state of any channel incurs a cost. We present a polynomial time algorithm that determines...

A Theoretical Case Study of Three Algorithms on the GPU: Depth Ordering, k-Selection and Matrix Multiplication (2008)

Sudipto Guha, Shankar Krishnan, Suresh Venkatasubramanian

The last few years has seen an explosion of effort in designing algorithms that harness the power of the GPU for general purpose computations. The list of problems for which efficient GPU algorithms...

Approximation and streaming algorithms for histogram construction problems (2008)

Sudipto Guha, Nick Koudas, Kyuseok Shim

Histograms are typically used to approximate data distributions. Histograms and related synopsis structures have been successful in a wide variety of popular database applications including...

Abstract CURE: An Efficient Clustering Algorithm for Large Databases (2008)

Sudipto Guha, Rajeev Rastogi, Kyuseok Shim

Clustering, in data mining, is useful for discovering groups and identifying interesting distributions in the underlying data. Traditional clustering algorithms either favor clusters with spherical...

ABSTRACT Approximate XML Joins (2008)

Sudipto Guha, Divesh Srivastava

XML is widely recognized as the data interchange standard for tomorrow, because of its ability to represent data from a wide variety of sources. Hence, XML is likely to be the format through which...

How to Probe for an Extreme Value ∗ (2008)

Ashish Goel, Sudipto Guha, Kamesh Munagala

In several systems applications, parameters such as load are known only with some associated uncertainty, which is specified, or modeled, as a distribution over values. The performance of the system...

Reasoning About Approximate Match Query Results (2008)

Sudipto Guha

Join techniques deploying approximate match predicates are fundamental data cleaning operations. A variety of predicates have been utilized to quantify approximate match in such operations and some...

ABSTRACT Wavelet Synopsis for Data Streams: Minimizing Non-Euclidean Error (2008)

Sudipto Guha

We consider the wavelet synopsis construction problem for data streams where given n numbers we wish to estimate the data by constructing a synopsis, whose size, say B is much smaller than n. The B...

Nonlinear Approximation and Image Representation using (2008)

Sudipto Guha, Boulos Harb

We address the problem of finding sparse wavelet representations of high-dimensional vectors. We present a lower-bounding technique and use it to develop an algorithm for computing...

Abstract (2008)

Sudipto Guha, Shankar Krishnan

Shadow mapping is a technique for doing real-time shadowing. Recent work has shown that shadow mapping hardware can be used as a second depth test in addition to the z-test. In this paper, we explore...

(Invited Paper) (2008)

Sudipto Guha, Kamesh Munagala, Saswati Sarkar

optimal transmission and probing strategies

Abstract CURE: An Efficient Clustering Algorithm for Large Databases (2008)

Sudipto Guha, Rajeev Rastogi, Kyuseok Shim

Clustering, in data mining, is useful for discovering groups and identifying interesting distributions in the underlying data. Traditional clustering algorithms either favor clusters with spherical...

Approximation Algorithms for Wavelet Transform Coding of Data Streams (2008)

Guha, Sudipto, Harb, Boulos

This paper addresses the problem of finding a B-term wavelet representation of a given discrete function ∈ Rn whose distance from is minimized. The problem is well understood when we seek to...

Learning to Create Data-Integrating Queries (2008)

Partha Pratim Talukdar, Marie Jacob, Muhammad Salman Mehmood, Koby Crammer, Zachary G. Ives, Fernando Pereira, ...

The number of potentially-related data resources available for querying — databases, data warehouses, virtual integrated schemas — continues to grow rapidly. Perhaps no area has seen this problem...

Tight Lower Bounds for Multi-Pass Stream Computation via Pass Elimination (2008)

Sudipto Guha, Andrew Mcgregor

There is a natural relationship between lower bounds in the multi-pass stream model and lower bounds in multiround communication. However, this connection is less understood than the connection...

Approximation Algorithms for Restless Bandit Problems (2007)

Guha, Sudipto, Munagala, Kamesh, Shi, Peng

The restless bandit problem is one of the most well-studied generalizations of the celebrated stochastic multi-armed bandit problem in decision theory. In its ultimate generality, the restless bandit...

x (2007)

Amotz Bar-noy, Sudipto Guha, Joseph (se Naor, Baruch Schieber

We consider the following fundamental scheduling problem. The input to the problem consists of n jobs and k machines. Each of the jobs is associated with a release time, a deadline, a weight, and a...

A Constant-Factor Approximation Algorithm for the (2007)

Median Problem Extended, Moses Charikar, Sudipto Guha, Eva Tardos, David B. Shmoys

) Moses Charikar Sudipto Guha y Eva Tardos z David B. Shmoys x Abstract We present the rst constant-factor approximation algorithm for the metric k-median problem. The k-median problem is one of the...

Efficient Recovery from Power Outage (Extended Summary) (2007)

Sudipto Guha, Anna Moss, Joseph (Seffi) Naor, Baruch Schieber

We study problems that are motivated by the real-life problem of ecient recovery from a wide scale electric power outage caused by a major disaster such as a hurricane or an equipment failure. In...

Approximating the Throughput of Multiple Machines in Real-Time Scheduling (2007)

Amotz Bar-noy, Sudipto Guha, Joseph (Seffi) Naor, Baruch Schieber

We consider the following fundamental scheduling problem. The input to the problem consists of n jobs and k machines. Each of the jobs is associated with a release time, a deadline, a weight, and a...

Rounding via Trees: Deterministic Approximation Algorithms for Group Steiner Trees and k-median (2007)

Moses Charikar, Chandra Chekuri, Ashish Goel, Sudipto Guha

Most optimization problems on an undirected graph reduce in complexity when restricted to instances on a tree. A recent result [3] for probabilistically approximating graph metrics by trees such that...

X (2007)

Sudipto Guha, Kamesh Munagala

In generalized k-clustering, we are given n points in a metric space with distance function d. The goal is to partition the points into k clusters, C 1; C 2; : : : ; C k so that for a given fl, the...

3 (2007)

Randeep Bhatia, Sudipto Guha, Samir Khuller, Yoram J. Sussmann

Facility location problems have always been studied with the assumption that the edge lengths in the network are static and do not change over time. The underlying network could be used to model a...

z (2007)

Anna C. Gilbert, Sudipto Guha, Piotr Indyk, Yannis Kotidis, S. Muthukrishnan, Martin J. Strauss

A vector A of length N is defined implicitly, via a stream of updates of the form "add 5 to A 3. " We give a sketching algorithm, that constructs a small sketch from the stream of...

y (2007)

Amotz Bar-noy, Sudipto Guha, Yoav Katz, Joseph (seffi Naor, Baruch Schieber, Hadas Shachnai

z--We consider the following scheduling with batching problem that has many applications, e.g., in multimedia-on-demand and manufacturing of integrated circuits. The input to the problem consists of...

Lower Bounds for Quantile Estimation in Random-Order and Multi-Pass Streaming (2007)

Guha, Sudipto, McGregor, Andrew

We present lower bounds on the space required to estimate the quantiles of a stream of numerical values. Quantile estimation is perhaps the most studied problem in the data stream model and it is...

A Note on Linear Time Algorithms for Maximum Error Histograms (2007)

Guha, Sudipto, Shim, Kyuseok

Histograms and Wavelet synopses provide useful tools in query optimization and approximate query answering. Traditional histogram construction algorithms, e.g., V-Optimal, use error measures which...

Lower bounds for quantile estimation in random-order and multi-pass streaming (2007)

Sudipto Guha, Andrew Mcgregor

Abstract. We present lower bounds on the space required to estimate the quantiles of a stream of numerical values. Quantile estimation is perhaps the most studied problem in the data stream model and...

Information acquisition and exploitation in multichannel wireless systems (2007)

Sudipto Guha, Kamesh Munagala, Saswati Sarkar

A wireless system with multiple channels is considered, where each channel has several transmission states. A user learns about the instantaneous state of an available channel by transmitting a...

Information acquisition and exploitation in multichannel wireless systems (2007)

Sudipto Guha, Kamesh Munagala, Saswati Sarkar

A wireless system with multiple channels is considered, where each channel has several transmission states. A user learns about the instantaneous state of an available channel by transmitting a...

Sketching information divergences (2007)

Sudipto Guha, Piotr Indyk, Andrew Mcgregor

When comparing discrete probability distributions, natural measures of similarity are not ℓp distances but rather are information divergences such as Kullback-Leibler and Hellinger. This paper...

Approximation algorithms for budgeted learning problems (2007)

Sudipto Guha, Kamesh Munagala

In this talk we investigate general techniques for designing simple approximately optimal policies for the problems similar to the optimal design of experiments under a budget constraint. As a...

Approximation algorithms for budgeted learning problems (2007)

Sudipto Guha, Kamesh Munagala

We present the first approximation algorithms for a large class of budgeted learning problems. One classic example of the above is the budgeted multi-armed bandit problem. In this problem each arm of...

Model-driven optimization using adaptive probes (2007)

Sudipto Guha, Kamesh Munagala

In several applications such as databases, planning, and sensor networks, parameters such as selectivity, load, or sensed values are known only with some associated uncertainty. The performance of...

Sketching information divergences (2007)

Sudipto Guha, Piotr Indyk, Andrew Mcgregor

Abstract. When comparing discrete probability distributions, natural measures of similarity are not ℓp distances but rather are informationdivergences such as Kullback-Leibler and Hellinger. This...

Analyzing a Greedy Approximation of an MDL Summarization Abstract (2007)

Peter Fontana, Faculty Advisor, Dr. Sudipto Guha

Many OLAP (On-line Analytical Processing) applications have produced data cubes that summarize and aggregate details of data queries. These data cubes are multi-dimensional matrices where each cell...

Sketching information divergences (2007)

Sudipto Guha, Piotr Indyk, Andrew Mcgregor

When comparing discrete probability distributions, natural measures of similarity are not ℓp distances but rather are information-divergences such as Kullback-Leibler and Hellinger. This paper...

Nonlinear Approximation and Image Representation using Wavelets (2007)

Guha, Sudipto, Harb, Boulos

We address the problem of finding sparse wavelet representations of high-dimensional vectors. We present a lower-bounding technique and use it to develop an algorithm for computing...

Approximation algorithms for wavelet transform coding of data streams (2006)

Guha, Sudipto, Harb, Boulos

This paper addresses the problem of finding a B-term wavelet representation of a given discrete function $f \in \real^n$ whose distance from f is minimized. The problem is well understood when we...

The Steiner k-Cut Problem (2006)

Chekuri, Chandra, Guha, Sudipto, Naor, Joseph

We consider the Steiner k-cut problem which generalizes both the k-cut problem and the multiway cut problem. The Steiner k-cut problem is defined as follows. Given an edge-weighted undirected graph G...

Jointly optimal transmission and probing strategies for multichannel wireless systems (2006)

Guha, Sudipto, Munagala, Kamesh, Sarkar, Saswati

We consider a wireless system with multiple channels when each channel has several different transmission states. Different states are associated with different probabilities of successful...

Streaming and sublinear approximation of entropy and information distances (2006)

Sudipto Guha, Andrew Mcgregor, Suresh Venkatasubramanian

In most algorithmic applications which compare two distributions, information theoretic distances are more natural than standard ℓp norms. In this paper we design streaming and sublinear time...

Stream-Order and Order-Statistics (2006)

Sudipto Guha, Andrew McGregor

When processing a data-stream in small space, how important is the order in which the data arrives? Are there problems that are unsolvable when the ordering is worst-case, that can typically be...

Performance guarantees through partial information based control in multichannel wireless networks (2006)

Sudipto Guha, Kamesh Munagala, Saswati Sarkar

We consider a wireless system with multiple channels when each channel has several transmission states. A user learns about the instantaneous state of an available channel by transmitting a control...

Approximation algorithms for wavelet transform coding of data streams (2006)

Sudipto Guha, Boulos Harb, Student Member

Abstract — This paper addresses the problem of finding a Bterm wavelet representation of a given discrete function f ∈ R n whose distance from f is minimized. The problem is well understood when...

Streaming and sublinear approximation of entropy and information distances (2006)

Sudipto Guha, Andrew Mcgregor, Suresh Venkatasubramanian

In most algorithmic applications which compare two distributions, information theoretic distances are more natural than standard ℓp norms. In this paper we design streaming and sublinear time...

Streaming and Sublinear Approximation of Entropy and Information Distances (2005)

Guha, Sudipto, McGregor, Andrew, Venkatasubramanian, Suresh

In many problems in data mining and machine learning, data items that need to be clustered or classified are not points in a high-dimensional space, but are distributions (points on a high...

Compression of Partially Ordered Strings (2005)

Alur, Rajeev, Chaudhuri, Swarat, Etessami, Kousha, Guha, Sudipto, Yannakakis, Mihalis

We introduce the problem of compressing partially ordered strings: given string σ ∈ Σ* and a binary independence relation I over Σ how can we compactly represent an input if the decompressor is...

How far will you walk to find your shortcut: Space Efficient Synopsis Construction Algorithms (2005)

Guha, Sudipto

In this paper we consider the wavelet synopsis construction problem without the restriction that we only choose a subset of coefficients of the original data. We provide the first near optimal...

Space efficiency in Synopsis construction algorithms (2005)

Sudipto Guha

Histograms and Wavelet synopses have been found to be useful in query optimization, approximate query answering and mining. Over the last few years several good synopsis algorithms have been...

Data Visualization and Mining using the GPU (2005)

Sudipto Guha, Shankar Krishnan, Suresh Venkatasubramanian

An exciting development in the computing industry has been the emergence of graphics processing units (the GPU) as a fast general purpose co-processor. Initially designed for gaming applications,...

Machine Minimization for Scheduling Jobs with Interval Constraints (2004)

Chuzhoy, Julia, Guha, Sudipto, Khanna, Sanjeev, Naor, Joseph

The problem of scheduling jobs with interval constraints is a well-studied classical scheduling problem. The input to the problem is a collection of n jobs where each job has a set of intervals on...

Asymmetric k-Center is log* n-Hard to Approximate (2004)

Chuzhoy, Julia, Guha, Sudipto, Halperin, Eran, Khanna, Sanjeev, Kortsarz, Guy, Krauthgamer, Robert, ...

In the Asymmetric k-Center problem, the input is an integer k and a complete digraph over n points together with a distance function obeying the directed triangle inequality. The goal is to choose a...

REHIST: Relative error histogram construction algorithms (2004)

Sudipto Guha, Kyuseok Shim, Jungchul Woo

Histograms and Wavelet synopses provide useful tools in query optimization and approximate query answering. Traditional histogram construction algorithms, such as V-Optimal, optimize absolute error...

XWAVE: Optimal and approximate extended wavelets for streaming data (2004)

Sudipto Guha, Chulyun Kim, Kyuseok Shim

Wavelet synopses have been found to be of interest in query optimization and approximate query answering. Recently, extended wavelets were proposed by Deligiannakis and Roussopoulos for data sets...

Merging the results of approximate match operations (2004)

Sudipto Guha

Data Cleaning is an important process that has been at the center of research interest in recent years. An important end goal of effective data cleaning is to identify the relational tuple or tuples...

REHIST: Relative error histogram construction algorithms (2004)

Sudipto Guha, Kyuseok Shim, Jungchul Woo

Histograms and Wavelet synopses provide useful tools in query optimization and approximate query answering. Traditional histogram construction algorithms, such as V-Optimal, optimize absolute error...

XWAVE: Optimal and approximate extended wavelets for streaming data (2004)

Sudipto Guha, Chulyun Kim, Kyuseok Shim

Wavelet synopses have been found to be of interest in query optimization and approximate query answering. Recently, extended wavelets were proposed by Deligiannakis and Roussopoulos for data sets...

REHIST: Relative error histogram construction algorithms (2004)

Sudipto Guha, Kyuseok Shim, Jungchul Woo

Histograms and Wavelet synopses provide useful tools in query optimization and approximate query answering. Traditional histogram construction algorithms, such as V-Optimal, optimize absolute error...

Merging the Results of Approximate Match Operations (2004)

Sudipto Guha, Nick Koudas, Amit Marathe, Divesh Srivastava

Data Cleaning is an important process that has been at the center of research interest in recent years. An important end goal of effective data cleaning is to identify the relational tuple or tuples...

Merging the Results of Approximate Match Operations Sudipto Guha (2004)

Of Pennsylvania Sudipto, Sudipto Guha

Data Cleaning is an important process that has been at the center of research interest in recent years. An important end goal of effective data cleaning is to identify the relational tuple or tuples...

XWAVE: Optimal and approximate extended wavelets for streaming data (2004)

Sudipto Guha, Chulyun Kim, Kyuseok Shim

Wavelet synopses have been found to be of interest in query optimization and approximate query answering. Recently, extended wavelets were proposed by Deligiannakis and Roussopoulos for data sets...

Capacitated vertex covering (2003)

Sudipto Guha, Refael Hassin, B Samir Khuller, Einat Or B

In this paper we study the capacitated vertex cover problem, a generalization of the well-known vertex cover problem. Given a graph G = (V, E) with weights on the vertices, the goal is to cover all...

Clustering data streams: Theory and practice (2003)

Sudipto Guha, Adam Meyerson, Nina Mishra, Rajeev Motwani

Abstract—The data stream model has recently attracted attention for its applicability to numerous types of data, including telephone records, Web documents, and clickstreams. For analysis of such...

Efficient approximation of optimization queries under parametric aggregation constraints (2003)

Sudipto Guha, Divesh Srivastava

We introduce and study a new class of queries that we refer to as OPAC (optimization under parametric aggregation constraints) queries. Such queries aim to identify sets of database tuples that...

Compression of partially ordered strings (2003)

Rajeev Alur, Swarat Chaudhuri, Kousha Etessami, Sudipto Guha, Mihalis Yannakakis

Abstract. We introduce the problem of compressing partially ordered strings: given string σ ∈ Σ ∗ and a binary independence relation I over Σ, how can we compactly represent an input if the...

Efficient approximation of optimization queries under parametric aggregation constraints (2003)

Sudipto Guha, Dimitrios Gunopulos, Nick Koudas, Divesh Srivastava, Michail Vlachos

We introduce and study a new class of queries that we refer to as OPAC (optimization under parametric aggregation constraints) queries. Such queries aim to identify sets of database tuples that...

Application of the Two-Sided Depth Test to CSG Rendering (2003)

Sudipto Guha, Shankar Krishnan, Kamesh Munagala, Suresh Venkatasubramanian

Shadow mapping is a technique for doing real-time shadowing. Recent work has shown that shadow mapping hardware can be used as a second depth test in addition to the z-test. In this paper, we explore...

Efficient Approximation of Optimization Queries Under Parametric (2003)

Aggregation Constraints Sudipto, Sudipto Guha, Divesh Srivastava

We introduce and study a new class of queries that we refer to as OPAC (optimization under parametric aggregation constraints) queries. Such queries aim to identify sets of database tuples that...

Asymmetric k-Center is log* n-hard to Approximate (2003)

Julia Chuzhoy, Sudipto Guha, Sanjeev Khanna, Joseph (Seffi) Naor

We show that the asymmetric k-center problem is 34 n)-hard to approximate unless NP DTIME(n ). Since an O(log n)-approximation algorithm is known for this problem, this essentially resolves the...

Efficient Approximation of Optimization Queries Under Parametric (2003)

Aggregation Constraints Sudipto, Sudipto Guha, Divesh Srivastava

We introduce and study a new class of queries that we refer to as OPAC (optimization under parametric aggregation constraints) queries. Such queries aim to identify sets of database tuples that...

Asymmetric k-center is log ∗ n-hard to approximate (2003)

Julia Chuzhoy, Sudipto Guha, Eran Halperin, Sanjeev Khanna

In the Asymmetric k-Center problem, the input is an integer k and a complete digraph over n points together with a distance function obeying the directed triangle inequality. The goal is to choose a...

Index-based approximate XML joins (2003)

Sudipto Guha, Nick Koudas

XML data integration tools are facing a variety of challenges for their efficient and effective operation. Among these is the requirement to handle a variety of inconsistencies or mistakes present in...

Compression of Partially Ordered Strings (2003)

Rajeev Alur Swarat, Rajeev Alur, Swarat Chaudhuri, Kousha Etessami, Sudipto Guha

We introduce the problem of compressing partially ordered strings: given string 2 and a binary independence relation I over , how can we compactly represent an input if the decompressor is allowed to...

Compression of Partially Ordered Strings (2003)

Rajeev Alur, Swarat Chaudhuri, Kousha Etessami, Sudipto Guha, Mihalis Yannakakis

We introduce the problem of compressing partially ordered strings: given string 2 and a binary independence relation I over , how can we compactly represent an input if the decompressor is allowed to...

Inferring mixtures of markov chains (2002)

Sudipto Guha, Sampath Kannan

Abstract. We define the problem of inferring a “mixture of Markov chains ” based on observing a stream of interleaved outputs from these chains. We show a sharp characterization of the inference...

Approximating a Data Stream for Querying and Estimation: Algorithms and Performance Evaluation (2002)

Sudipto Guha

Obtaining fast and good quality approximations to data distributions is a problem of central interest to database management. A variety of popular database applications including, approximate...

Improved algorithms for the data placement problem (2002)

Sudipto Guha, Kamesh Munagala

We study the data placement problem [1, 3], where the goal is to place data objects in xed capacity caches in a network to optimize latency

Streaming-Data Algorithms for High-Quality Clustering (2002)

Nina Mishra, Adam Meyerson, Sudipto Guha, Rajeev Motwani

As data gathering grows easier, and as researchers discover new ways to interpret data, streamingdata algorithms have become essential in many fields. Data stream computation precludes algorithms...

Streaming-Data Algorithms for High-Quality Clustering (2002)

Liadan O'Callaghan, Nina Mishra, Sudipto Guha, Adam Meyerson, Rajeev Motwani

Streamingd ata analysis has recently attracted attention in numerous applicationsincludti telephonerecord s, webd ocuments and clickstreams. For such analysis, single-pass algorithms that consume a...

Fast, Small-Space Algorithms for Approximate Histogram Maintenance (Extended Abstract) (2002)

Anna Gilbert, Yannis Kotidis, Sudipto Guha, S. Muthukrishnan, Piotr Indyk, Martin J. Strauss

Anna C. Gilbert AT&T Labs---Research agilbert@research.att.com Yannis Kotidis AT&T Labs---Research kotidis@research.att.com Sudipto Guha CIS,University of Pennsylvania sudipto@cis.upenn.edu...

Fast, Small-Space Algorithms for Approximate Histogram (2002)

Maintenance Extend Ed, Anna C. Gilbert, Sudipto Guha, Piotr Indyk, Yannis Kotidis, S. Muthukrishnan, ...

Anna C. Gilbert AT&T Labs---Research agilbert@research.att.com Yannis Kotidis AT&T Labs---Research kotidis@research.att.com Sudipto Guha CIS,University of Pennsylvania sudipto@cis.upenn.edu...

Approximate XML Joins (2002)

Sudipto Guha, Divesh Srivastava

XML is widely recognized as the data interchange standard for tomorrow, because of its ability to represent data from a wide variety of sources. Hence, XML is likely to be the format through which...

Fast Algorithms for Hierarchical Range Histogram Construction (2002)

Sudipto Guha, Nick Koudas, Divesh Srivastava

Data Warehousing and OLAP applications typically view data as having multiple logical dimensions (e.g., product, location) with natural hierarchies defined on each dimension. OLAP queries usually...

Capacitated Vertex Covering with Applications (2002)

Sudipto Guha Refael, Sudipto Guha, Refael Hassin, Samir Khuller

In this paper we study the capacitated vertex cover problem, a generalization of the well known vertex problem. Given a graph G = (V; E) with weights on the vertices, the goal is to cover all the...

Generalized clustering (2002)

Sudipto Guha, Kamesh Munagala

In generalized k-clustering, we are given n points in a metric space with distance function d. The goal is to partition the points into k clusters, C 1;C 2;:::;Ck so that for a given, the following...

A constant factor approximation for the single sink edge installation problems (2001)

Sudipto Guha, Adam Meyerson, Kamesh Munagala

We present the first constant approximation to the single sink buy-at-bulk network design problem, where we have to design a network by buying pipes of different costs and capacities per unit length...

A constant factor approximation for the single sink edge installation problems (2001)

Sudipto Guha, Adam Meyerson, Kamesh Munagala

We present the rst constant approximation to the single sink buy-at-bulk network design problem, where we have to design a network by buying pipes of dierent costs and capacities per unit length to...

Data-streams and histograms (2001)

Sudipto Guha, Nick Koudas, Kyuseok Shim

Histograms have been used widely to capture data distribution, to represent the data by a small number of step functions. Dynamic programming algorithms which provide optimal construction of these...

Improved algorithms for fault tolerant facility location (2001)

Sudipto Guha, Adam Meyerson, Kamesh Munagala

We consider a generalization of the classical facility location problem, where we require the solution to be fault-tolerant. Every demand point j is served by r j facilities instead of just one. The...

Improved algorithms for fault tolerant facility location (2001)

Sudipto Guha, Adam Meyerson, Kamesh Munagala

We consider a generalization of the classical facility location problem, where we require the solution to be fault-tolerant. Every demand point j is served by r j facilities instead of just one. The...

A constant factor approximation for the single sink edge installation problems (2001)

Sudipto Guha

ABSTRACT We present the first constant approximation to the single sink buy-at-bulk network design problem, where we have to design a network by buying pipes of different costs and capacities per...

Streaming-Data Algorithms for High-Quality Clustering (2001)

Liadan O'Callaghan, Nina Mishra, Adam Meyerson, Sudipto Guha, Rajeev Motwani

As data gathering grows easier, and as researchers discover new ways to interpret data, streamingdata algorithms have become essential in many fields. Data stream computation precludes algorithms...

Hierarchical reliable multicast: Performance analysis and placement of proxies (2000)

Sudipto Guha, Athina Markopoulou, Fouad Tobagi

Abstract — This paper studies the use of multicast together with proxy nodes for reliably disseminating data from a single source to a large number of receivers. In order to achieve reliability,...

Hierarchical placement and network design problems (2000)

Sudipto Guha, Adam Meyersoný, Kamesh Munagalaþ

In this paper, we give the first constant-approximations for a number of layered network design problems. We begin by modeling hierarchical caching, where caches are placed in layers and each layer...

Hierarchical placement and network design problems (2000)

Sudipto Guha, Adam Meyerson, Kamesh Munagala

In this paper, we give the rst constant-approximations for a number of layered network design problems. We begin by modeling hierarchical caching, where caches are placed in layers and each layer...

Clustering Data Streams (2000)

Sudipto Guha, Nina Mishra, Rajeev Motwani

We study clustering under the data stream model of computation where: given a sequence of points, the objective is to maintain a consistently good clustering of the sequence observed so far, using a...

Hierarchical placement and network design problems (2000)

Sudipto Guha, Adam Meyerson, Kamesh Munagala

In this paper, we give the first constant-approximations for a number of layered network design problems. We begin by modeling hierarchical caching, where caches are placed in layers and each layer...

Improved Approximations of Crossings in Graph Drawings and VLSI Layout Areas (2000)

Guy Even, Sudipto Guha, Baruch Schieber

We give improved approximations for two classical embedding problems: (i) minimizing the number of crossings in a drawing of a bounded degree graph on the plane; and (ii) minimizing the VLSI layout...

Nested Graph Dissection and Approximation Algorithms (2000)

Sudipto Guha

This paper considers approximation algorithms for graph completion problems using the nested dissection paradigm. Given a super-additive function of interest (the smallest planar or chordal extension...

A constant-factor approximation algorithm for the k-median problem (1999)

Moses Charikar, Sudipto Guha, Éva Tardos, David B. Shmoys

We present the first constant-factor approximation algorithm for the metric k-median problem. The k-median problem is one of the most well-studied clustering problems, i.e., those problems in which...

Improved Methods for Approximating Node Weighted Steiner Trees and Connected Dominating Sets (1999)

Sudipto Guha, Samir Khuller Y

A greedy approximation algorithm based on \spider decompositions " was developed by Klein and Ravi for node weighted Steiner trees. This algorithm provides a worst case approximation ratio...

ROCK:ARobust Clustering Algorithm for Categorical Attributes (1999)

Sudipto Guha, Rajeev Rastogi, Kyuseok Shim

Abstract | Clustering, in data mining, is useful to discover distribution patterns in the underlying data. Clustering algorithms usually employ a distance metric based (e.g., euclidean) similarity...

Improved Methods for Approximating Node Weighted Steiner Trees and Connected Dominating Sets (1999)

Sudipto Guha, Samir Khuller

A greedy approximation algorithm based on "spider decompositions " was developed by Klein and Ravi for node weighted Steiner trees. This algorithm provides a worst case...

ROCK: A robust clustering algorithm for categorical attributes (1999)

Sudipto Guha, Rajeev Rastogi, Kyuseok Shim

We study clustering algorithms for data with boolean and categorical attributes. We show that traditional clustering algorithms that use distances between points for clustering are not appropriate...

Improved Combinatorial Algorithms for the Facility Location and k-Median Problems (1999)

Moses Charikar, Sudipto Guha

We present improved combinatorial approximation algorithms for the uncapacitated facility location and k-median problems. Two central ideas in most of our results are cost scaling and greedy...

A constant-factor approximation algorithm for the k-median problem (Extended Abstract) (1999)

Moses Charikar, Sudipto Guha, Éva Tardos, Eva Tardos, David B. Shmoys

) Moses Charikar Sudipto Guha y Eva Tardos z David B. Shmoys x Abstract We present the rst constant-factor approximation algorithm for the metric k-median problem. The k-median problem is one of the...

Improved Methods for Approximating Node Weighted Steiner Trees and Connected Dominating Sets (1999)

Sudipto Guha, Samir Khuller

A greedy approximation algorithm based on "spider decompositions" was developed by Klein and Ravi for node weighted Steiner trees. This algorithm provides a worst case approximation ratio...

ROCK: A Robust Clustering Algorithm for Categorical Attributes (1999)

Sudipto Guha, Rajeev Rastogi, Kyuseok Shim

We study clustering algorithms for data with boolean and categorical attributes. We show that traditional clustering algorithms that use distances between points for clustering are not appropriate...

Improved combinatorial algorithms for the facility location and k-median problems (1999)

Moses Charikar, Sudipto Guha

We present improved combinatorial approximation algorithms for the uncapacitated facility location problem. Two central ideas in most of our results are cost scaling and greedy improvement. We...

Approximation Algorithms for Directed Steiner Problems (1998)

Moses Charikar, Chandra Chekuri, To-yat Cheung, Zuo Dai, Ashish Goel, Sudipto Guha, ...

We give the first non-trivial approximation algorithms for the Steiner tree problem and the generalized Steiner network problem on general directed graphs. These problems have several applications in...

Message Multicasting In Heterogeneous Networks (1998)

Amotz Bar-Noy, Sudipto Guha, Joseph (Seffi) Naor, Baruch Schieber

In heterogeneous networks, sending messages may incur different delays on different links, and each node may have a different switching time between messages. The well studied Telephone model is...

Approximation Algorithms for Directed Steiner Problems (1998)

Moses Charikar, Chandra Chekuri, To-yat Cheung, Zuo Dai, Ashish Goel, Sudipto Guha, ...

We give the rst non-trivial approximation algorithms for the Steiner tree problem and the generalized Steiner network problem on general directed graphs. These problems have several applications in...

Rounding via Trees: Deterministic Approximation Algorithms for Group Steiner Trees and k-median (1998)

Moses Charikar, Chandra Chekuri, Ashish Goel, Sudipto Guha

Most optimization problems on an undirected graph reduce in complexity when restricted to instances on a tree. A recent result [3] for probabilistically approximating graph metrics by trees such that...

Approximation Algorithms for Directed Steiner Problems (1998)

Moses Charikar Stanford, Moses Charikar, Chandra Chekuri, Zuo Dai, Ashish Goel, Sudipto Guha, ...

We give the first non-trivial approximation algorithms for the Steiner tree problem and the generalized Steiner network problem on general directed graphs. These problems have several applications in...

Approximation Algorithms for Directed Steiner Problems (1998)

Moses Charikar, Chandra Chekuri, To-yat Cheung, Zuo Dai, Ashish Goel, Sudipto Guha, ...

We give the first non-trivial approximation algorithms for the Steiner tree problem and the generalized Steiner network problem on general directed graphs. These problems have several applications in...

Greedy strikes back: Improved Facility Location Algorithms (1998)

Sudipto Guha, Samir Khuller

A fundamental facility location problem is to choose the location of facilities, such as industrial plants and warehouses, to minimize the cost of satisfying the demand for some commodity. There are...

Multicasting in Heterogeneous Networks (1998)

Amotz Bar-noy, Sudipto Guha, Joseph (Seffi) Naor, Baruch Schieber

In heterogeneous networks sending messages may incur different delays on different links, and each node may have a different switching time between messages. The well studied Telephone model is...

Greedy strikes back: Improved Facility Location Algorithms (1998)

Sudipto Guha, Samir Khuller

A fundamental facility location problem is to choose the location of facilities, such as industrial plants and warehouses, to minimize the cost of satisfying the demand for some commodity. There are...

Facility Location with Dynamic Distance Functions (Extended Abstract) (1998)

Randeep Bhatia, Sudipto Guha, Samir Khuller, Yoram J. Sussmann

) Randeep Bhatia Sudipto Guha y Samir Khuller z Yoram J. Sussmann x Dept. of Computer Science University of Maryland College Park, MD 20742 Abstract Facility location problems have always been...

Improved Methods for Approximating Node Weighted Steiner Trees and Connected Dominating Sets (1997)

Guha, Sudipto, Khuller, Samir

A greedy approximation algorithm based on ``spider decompositions'' was developed by Klein and Ravi for node weighted Steiner trees. This algorithm provides a worst case approximation ratio of $2 \ln...

Improved Methods for Approximating Node Weighted Steiner Trees and Connected Dominating Sets (1997)

Guha, Sudipto, Khuller, Samir

A greedy approximation algorithm based on ``spider decompositions'' was developed by Klein and Ravi for node weighted Steiner trees. This algorithm provides a worst case approximation ratio of $2 \ln...

Facility Location with Dynamic Distance Functions (1997)

Bhatia, Randeep, Guha, Sudipto, Khuller, Samir, Sussmann, Yoram J.

Facility location problems have always been studied with the assumption that the edge lengths in the network are {\em static} and do not change over time. The underlying network could be used to...

Facility Location with Dynamic Distance Functions (1997)

Bhatia, Randeep, Guha, Sudipto, Khuller, Samir, Sussmann, Yoram J.

Facility location problems have always been studied with the assumption that the edge lengths in the network are {\em static} and do not change over time. The underlying network could be used to...

Clustering algorithm for categorical attributes (1997)

Sudipto Guha, Rajeev Rastogi, Kyuseok Shim

Clustering, in data mining, is useful to discover distribution patterns of the underlying data. Clustering algorithms usually employ a distance based (e.g., euclidean) similarity measure in order to...

Approximation Algorithms for Connected Dominating Sets (1996)

Guha, Sudipto, Khuller, Samir

The dominating set problem in graphs asks for a minimum size subset of vertices with the following property: each vertex is required to either be in the dominating set, or adjacent to some node in...

Approximation Algorithms for Connected Dominating Sets (1996)

Guha, Sudipto, Khuller, Samir

The dominating set problem in graphs asks for a minimum size subset of vertices with the following property: each vertex is required to either be in the dominating set, or adjacent to some node in...

Approximation Algorithms for Connected Dominating Sets (1996)

Sudipto Guha, Samir Khuller

The dominating set problem in graphs asks for a minimum size subset of vertices with the following property: each vertex is required to either be in the dominating set, or adjacent to some node in...

Approximation Algorithms for Connected Dominating Sets (1996)

Sudipto Guha, Samir Khuller

The dominating set problem in graphs asks for a minimum size subset of vertices with the following property: each vertex is required to be either in the dominating set, or adjacent to some vertex in...

Facility Location with Dynamic Distance Functions

Randeep Bhatia, Sudipto Guha, Samir Khuller, Y. J. Sussmann, Ding-zhu Du

. Facility location problems have always been studied with the assumption that the edge lengths in the network are static and do not change over time. The underlying network could be used to model a...

Facility Location with Dynamic Distance Functions

Randeep Bhatia, Sudipto Guha, Samir Khuller, Yoram J. Sussmann

Facility location problems have always been studied with the assumption that the edge lengths in the network are static and do not change over time. The underlying network could be used to model a...