Learning to Create Data-Integrating Queries (2009)
Partha Pratim, Talukdar Marie, Jacob Muhammad, Salman Mehmood, O Pereira, Sudipto Guha
The number of potentially-related data resources available for querying — databases, data warehouses, virtual integrated schemas — continues to grow rapidly. Perhaps no area has seen this problem...
Learning to Create Data-Integrating Queries (2009)
Partha Pratim, Talukdar Marie, Jacob Muhammad, Salman Mehmood, O Pereira, Sudipto Guha
The number of potentially-related data resources available for querying — databases, data warehouses, virtual integrated schemas — continues to grow rapidly. Perhaps no area has seen this problem...
How to Probe for an Extreme Value ∗ (2009)
Ashish Goel, Sudipto Guha, Kamesh Munagala
In several systems applications, parameters such as load are known only with some associated uncertainty, which is specified, or modeled, as a distribution over values. The performance of the system...
A Substrate for In-Network Sensor Data Integration (2009)
Svilen R. Mihaylov, Marie Jacob, Zachary G. Ives, Sudipto Guha
With the ultimate goal of extending the data integration paradigm and query processing capabilities to ad hoc wireless networks, sensors, and stream systems, we consider how to support communication...
STREAM ORDER AND ORDER STATISTICS: QUANTILE ESTIMATION IN RANDOM-ORDER STREAMS ∗ (2009)
Abstract. When trying to process a data-stream in small space, how important is the order in which the data arrives? Are there problems that are unsolvable when the ordering is worst-case, that can...
Learning to Create Data-Integrating Queries (2009)
Partha Pratim, Talukdar Marie, Jacob Muhammad, Salman Mehmood, O Pereira, Sudipto Guha
The number of potentially-related data resources available for querying — databases, data warehouses, virtual integrated schemas — continues to grow rapidly. Perhaps no area has seen this problem...
Graph Sparsification in the Semi-Streaming Model (2009)
Analyzing massive data sets has been one of the key motivations for studying streaming algorithms. In recent years, there has been significant progress in analysing distributions in a streaming...
Revisiting the Direct Sum Theorem and Space Lower Bounds in Random Order Streams (2009)
Estimating frequency moments and $L_p$ distances are well studied problems in the adversarial data stream model and tight space bounds are known for these two problems. There has been growing...
A Constant Factor Approximation for the Single Sink Edge Installation Problem (2009)
Guha, Sudipto, Meyerson, Adam, Munagala, Kamesh
We present the first constant approximation to the single sink buy-at-bulk network design problem, where we have to design a network by buying pipes of different costs and capacities per unit length...
Multi-armed Bandits with Limited Exploration ∗ (2009)
A central problem to decision making under uncertainty is the trade-off between exploration and exploitation: between learning from and adapting to a stochastic system and exploiting the current...
Sparsification Algorithm for Cut Problems on Semi-streaming Model (2009)
The emergence of social networks and other interaction networks have brought to fore the questions of processing massive graphs. The (semi) streaming model, where we assume that the space is (near)...
Stream Order and Order Statistics: Quantile Estimation in Random-Order Streams (2009)
Guha, Sudipto, Mcgregor, Andrew
When trying to process a data stream in small space, how important is the order in which the data arrive? Are there problems that are unsolvable when the ordering is worst case, but that can be...
Tight results for clustering and summarizing data streams (2009)
In this paper we investigate algorithms and lower bounds for summarization problems over a single pass data stream. In particular we focus on histogram construction and K-center clustering. We...
ABSTRACT Wavelet Synopsis for Data Streams: Minimizing Non-Euclidean Error (2008)
We consider the wavelet synopsis construction problem for data streams where given n numbers we wish to estimate the data by constructing a synopsis, whose size, say B is much smaller than n. The B...
Synopses construction algorithms have been found to be of interest in query optimization, approximate query answering and mining, and over the last few years several good synopsis construction...
Adaptive Uncertainty Resolution in Bayesian Combinatorial Optimization Problems (2008)
Guha, Sudipto, Munagala, Kamesh
In several applications such as databases, planning, and sensor networks, parameters such as selectivity, load, or sensed values are known only with some associated uncertainty. The performance of...
ABSTRACT Approximate XML Joins (2008)
Sudipto Guha, Divesh Srivastava
XML is widely recognized as the data interchange standard for tomorrow, because of its ability to represent data from a wide variety of sources. Hence, XML is likely to be the format through which...
We consider a variant of the classic multi-armed bandit problem (MAB), which we call FEEDBACK MAB, where the reward obtained by playing each of n independent arms varies according to an underlying...
A Substrate for In-Network Sensor Data Integration (2008)
Mihaylov, Svilen R, Jacob, Marie, Ives, Zachary G, Guha, Sudipto
With the ultimate goal of extending the data integration paradigm and query processing capabilities to ad hoc wireless networks, sensors, and stream systems, we consider how to support communication...
A Substrate for In-Network Sensor Data Integration (2008)
Mihaylov, Svilen, Jacob, Marie, Ives, Zachary G, Guha, Sudipto
With the ultimate goal of extending the data integration paradigm and query processing capabilities to ad hoc wireless networks, sensors, and stream systems, we consider how to support communication...
Sudipto Guha, Adam Meyersoný, Kamesh Munagalaþ
ÓÒ�ÒÙÒ��ÖÐÝ�Ò�Ñ�ØÖ��ÒÓÖ��ÖØÓÓÒÒ�Ø�×�ØÓ���Ñ�Ò �...
Sequential Design of Experiments via Linear Programming (2008)
Guha, Sudipto, Munagala, Kamesh
The celebrated multi-armed bandit problem in decision theory models the basic trade-off between exploration, or learning about the state of a system, and exploitation, or utilizing the system. In...
Information Acquisition and Exploitation in Multichannel Wireless Networks (2008)
Guha, Sudipto, Munagala, Kamesh, Sarkar, Saswati
A wireless system with multiple channels is considered, where each channel has several transmission states. A user learns about the instantaneous state of an available channel by transmitting a...
Sudipto Guha, Rajeev Rastogi, Kyuseok Shim
Clustering, in data mining, is useful to discover distribution patterns in the underlying data. Clustering algorithms usually employ a distance metric based (e.g., euclidean) similarity measure in...
General Terms: Algorithms,Performance,Theory (2008)
We consider a wireless system with multiple channels where each channel is either on or off, and probing the state of any channel incurs a cost. We present a polynomial time algorithm that determines...
Sudipto Guha, Shankar Krishnan, Suresh Venkatasubramanian
The last few years has seen an explosion of effort in designing algorithms that harness the power of the GPU for general purpose computations. The list of problems for which efficient GPU algorithms...
Approximation and streaming algorithms for histogram construction problems (2008)
Sudipto Guha, Nick Koudas, Kyuseok Shim
Histograms are typically used to approximate data distributions. Histograms and related synopsis structures have been successful in a wide variety of popular database applications including...
Abstract CURE: An Efficient Clustering Algorithm for Large Databases (2008)
Sudipto Guha, Rajeev Rastogi, Kyuseok Shim
Clustering, in data mining, is useful for discovering groups and identifying interesting distributions in the underlying data. Traditional clustering algorithms either favor clusters with spherical...
ABSTRACT Approximate XML Joins (2008)
Sudipto Guha, Divesh Srivastava
XML is widely recognized as the data interchange standard for tomorrow, because of its ability to represent data from a wide variety of sources. Hence, XML is likely to be the format through which...
How to Probe for an Extreme Value ∗ (2008)
Ashish Goel, Sudipto Guha, Kamesh Munagala
In several systems applications, parameters such as load are known only with some associated uncertainty, which is specified, or modeled, as a distribution over values. The performance of the system...
Reasoning About Approximate Match Query Results (2008)
Join techniques deploying approximate match predicates are fundamental data cleaning operations. A variety of predicates have been utilized to quantify approximate match in such operations and some...
ABSTRACT Wavelet Synopsis for Data Streams: Minimizing Non-Euclidean Error (2008)
We consider the wavelet synopsis construction problem for data streams where given n numbers we wish to estimate the data by constructing a synopsis, whose size, say B is much smaller than n. The B...
Nonlinear Approximation and Image Representation using (2008)
We address the problem of finding sparse wavelet representations of high-dimensional vectors. We present a lower-bounding technique and use it to develop an algorithm for computing...
Sudipto Guha, Shankar Krishnan
Shadow mapping is a technique for doing real-time shadowing. Recent work has shown that shadow mapping hardware can be used as a second depth test in addition to the z-test. In this paper, we explore...
Sudipto Guha, Kamesh Munagala, Saswati Sarkar
optimal transmission and probing strategies
Abstract CURE: An Efficient Clustering Algorithm for Large Databases (2008)
Sudipto Guha, Rajeev Rastogi, Kyuseok Shim
Clustering, in data mining, is useful for discovering groups and identifying interesting distributions in the underlying data. Traditional clustering algorithms either favor clusters with spherical...
Approximation Algorithms for Wavelet Transform Coding of Data Streams (2008)
This paper addresses the problem of finding a B-term wavelet representation of a given discrete function ∈ Rn whose distance from is minimized. The problem is well understood when we seek to...
Learning to Create Data-Integrating Queries (2008)
Partha Pratim Talukdar, Marie Jacob, Muhammad Salman Mehmood, Koby Crammer, Zachary G. Ives, Fernando Pereira, ...
The number of potentially-related data resources available for querying — databases, data warehouses, virtual integrated schemas — continues to grow rapidly. Perhaps no area has seen this problem...
Tight Lower Bounds for Multi-Pass Stream Computation via Pass Elimination (2008)
There is a natural relationship between lower bounds in the multi-pass stream model and lower bounds in multiround communication. However, this connection is less understood than the connection...
Approximation Algorithms for Restless Bandit Problems (2007)
Guha, Sudipto, Munagala, Kamesh, Shi, Peng
The restless bandit problem is one of the most well-studied generalizations of the celebrated stochastic multi-armed bandit problem in decision theory. In its ultimate generality, the restless bandit...
Amotz Bar-noy, Sudipto Guha, Joseph (se Naor, Baruch Schieber
We consider the following fundamental scheduling problem. The input to the problem consists of n jobs and k machines. Each of the jobs is associated with a release time, a deadline, a weight, and a...
Moses Charikar, Sudipto Guha, Eva Tardos, David B. Shmoys
A constant-factor approximation algorithm
A Constant-Factor Approximation Algorithm for the (2007)
Median Problem Extended, Moses Charikar, Sudipto Guha, Eva Tardos, David B. Shmoys
) Moses Charikar Sudipto Guha y Eva Tardos z David B. Shmoys x Abstract We present the rst constant-factor approximation algorithm for the metric k-median problem. The k-median problem is one of the...
Efficient Recovery from Power Outage (Extended Summary) (2007)
Sudipto Guha, Anna Moss, Joseph (Seffi) Naor, Baruch Schieber
We study problems that are motivated by the real-life problem of ecient recovery from a wide scale electric power outage caused by a major disaster such as a hurricane or an equipment failure. In...
Approximating the Throughput of Multiple Machines in Real-Time Scheduling (2007)
Amotz Bar-noy, Sudipto Guha, Joseph (Seffi) Naor, Baruch Schieber
We consider the following fundamental scheduling problem. The input to the problem consists of n jobs and k machines. Each of the jobs is associated with a release time, a deadline, a weight, and a...
Moses Charikar, Chandra Chekuri, Ashish Goel, Sudipto Guha
Most optimization problems on an undirected graph reduce in complexity when restricted to instances on a tree. A recent result [3] for probabilistically approximating graph metrics by trees such that...
In generalized k-clustering, we are given n points in a metric space with distance function d. The goal is to partition the points into k clusters, C 1; C 2; : : : ; C k so that for a given fl, the...
Randeep Bhatia, Sudipto Guha, Samir Khuller, Yoram J. Sussmann
Facility location problems have always been studied with the assumption that the edge lengths in the network are static and do not change over time. The underlying network could be used to model a...
Anna C. Gilbert, Sudipto Guha, Piotr Indyk, Yannis Kotidis, S. Muthukrishnan, Martin J. Strauss
A vector A of length N is defined implicitly, via a stream of updates of the form "add 5 to A 3. " We give a sketching algorithm, that constructs a small sketch from the stream of...
Amotz Bar-noy, Sudipto Guha, Yoav Katz, Joseph (seffi Naor, Baruch Schieber, Hadas Shachnai
z--We consider the following scheduling with batching problem that has many applications, e.g., in multimedia-on-demand and manufacturing of integrated circuits. The input to the problem consists of...
Lower Bounds for Quantile Estimation in Random-Order and Multi-Pass Streaming (2007)
Guha, Sudipto, McGregor, Andrew
We present lower bounds on the space required to estimate the quantiles of a stream of numerical values. Quantile estimation is perhaps the most studied problem in the data stream model and it is...
A Note on Linear Time Algorithms for Maximum Error Histograms (2007)
Histograms and Wavelet synopses provide useful tools in query optimization and approximate query answering. Traditional histogram construction algorithms, e.g., V-Optimal, use error measures which...
Graduate Group Chairperson (2007)
Boulos Harb, Sampath Kannan, Sudipto Guha, Rajeev Alur, Boulos Harb
iii
Lower bounds for quantile estimation in random-order and multi-pass streaming (2007)
Abstract. We present lower bounds on the space required to estimate the quantiles of a stream of numerical values. Quantile estimation is perhaps the most studied problem in the data stream model and...
Information acquisition and exploitation in multichannel wireless systems (2007)
Sudipto Guha, Kamesh Munagala, Saswati Sarkar
A wireless system with multiple channels is considered, where each channel has several transmission states. A user learns about the instantaneous state of an available channel by transmitting a...
Information acquisition and exploitation in multichannel wireless systems (2007)
Sudipto Guha, Kamesh Munagala, Saswati Sarkar
A wireless system with multiple channels is considered, where each channel has several transmission states. A user learns about the instantaneous state of an available channel by transmitting a...
Sketching information divergences (2007)
Sudipto Guha, Piotr Indyk, Andrew Mcgregor
When comparing discrete probability distributions, natural measures of similarity are not ℓp distances but rather are information divergences such as Kullback-Leibler and Hellinger. This paper...
Approximation algorithms for budgeted learning problems (2007)
In this talk we investigate general techniques for designing simple approximately optimal policies for the problems similar to the optimal design of experiments under a budget constraint. As a...
Approximation algorithms for budgeted learning problems (2007)
We present the first approximation algorithms for a large class of budgeted learning problems. One classic example of the above is the budgeted multi-armed bandit problem. In this problem each arm of...
Model-driven optimization using adaptive probes (2007)
In several applications such as databases, planning, and sensor networks, parameters such as selectivity, load, or sensed values are known only with some associated uncertainty. The performance of...
Sketching information divergences (2007)
Sudipto Guha, Piotr Indyk, Andrew Mcgregor
Abstract. When comparing discrete probability distributions, natural measures of similarity are not ℓp distances but rather are informationdivergences such as Kullback-Leibler and Hellinger. This...
Analyzing a Greedy Approximation of an MDL Summarization Abstract (2007)
Peter Fontana, Faculty Advisor, Dr. Sudipto Guha
Many OLAP (On-line Analytical Processing) applications have produced data cubes that summarize and aggregate details of data queries. These data cubes are multi-dimensional matrices where each cell...
Sketching information divergences (2007)
Sudipto Guha, Piotr Indyk, Andrew Mcgregor
When comparing discrete probability distributions, natural measures of similarity are not ℓp distances but rather are information-divergences such as Kullback-Leibler and Hellinger. This paper...
Nonlinear Approximation and Image Representation using Wavelets (2007)
We address the problem of finding sparse wavelet representations of high-dimensional vectors. We present a lower-bounding technique and use it to develop an algorithm for computing...
Approximation algorithms for wavelet transform coding of data streams (2006)
This paper addresses the problem of finding a B-term wavelet representation of a given discrete function $f \in \real^n$ whose distance from f is minimized. The problem is well understood when we...
The Steiner k-Cut Problem (2006)
Chekuri, Chandra, Guha, Sudipto, Naor, Joseph
We consider the Steiner k-cut problem which generalizes both the k-cut problem and the multiway cut problem. The Steiner k-cut problem is defined as follows. Given an edge-weighted undirected graph G...
Jointly optimal transmission and probing strategies for multichannel wireless systems (2006)
Guha, Sudipto, Munagala, Kamesh, Sarkar, Saswati
We consider a wireless system with multiple channels when each channel has several different transmission states. Different states are associated with different probabilities of successful...
Streaming and sublinear approximation of entropy and information distances (2006)
Sudipto Guha, Andrew Mcgregor, Suresh Venkatasubramanian
In most algorithmic applications which compare two distributions, information theoretic distances are more natural than standard ℓp norms. In this paper we design streaming and sublinear time...
Stream-Order and Order-Statistics (2006)
When processing a data-stream in small space, how important is the order in which the data arrives? Are there problems that are unsolvable when the ordering is worst-case, that can typically be...
Sudipto Guha, Kamesh Munagala, Saswati Sarkar
We consider a wireless system with multiple channels when each channel has several transmission states. A user learns about the instantaneous state of an available channel by transmitting a control...
Approximation algorithms for wavelet transform coding of data streams (2006)
Sudipto Guha, Boulos Harb, Student Member
Abstract — This paper addresses the problem of finding a Bterm wavelet representation of a given discrete function f ∈ R n whose distance from f is minimized. The problem is well understood when...
Streaming and sublinear approximation of entropy and information distances (2006)
Sudipto Guha, Andrew Mcgregor, Suresh Venkatasubramanian
In most algorithmic applications which compare two distributions, information theoretic distances are more natural than standard ℓp norms. In this paper we design streaming and sublinear time...
Streaming and Sublinear Approximation of Entropy and Information Distances (2005)
Guha, Sudipto, McGregor, Andrew, Venkatasubramanian, Suresh
In many problems in data mining and machine learning, data items that need to be clustered or classified are not points in a high-dimensional space, but are distributions (points on a high...
Compression of Partially Ordered Strings (2005)
Alur, Rajeev, Chaudhuri, Swarat, Etessami, Kousha, Guha, Sudipto, Yannakakis, Mihalis
We introduce the problem of compressing partially ordered strings: given string σ ∈ Σ* and a binary independence relation I over Σ how can we compactly represent an input if the decompressor is...
How far will you walk to find your shortcut: Space Efficient Synopsis Construction Algorithms (2005)
In this paper we consider the wavelet synopsis construction problem without the restriction that we only choose a subset of coefficients of the original data. We provide the first near optimal...
Space efficiency in Synopsis construction algorithms (2005)
Histograms and Wavelet synopses have been found to be useful in query optimization, approximate query answering and mining. Over the last few years several good synopsis algorithms have been...
Data Visualization and Mining using the GPU (2005)
Sudipto Guha, Shankar Krishnan, Suresh Venkatasubramanian
An exciting development in the computing industry has been the emergence of graphics processing units (the GPU) as a fast general purpose co-processor. Initially designed for gaming applications,...
Machine Minimization for Scheduling Jobs with Interval Constraints (2004)
Chuzhoy, Julia, Guha, Sudipto, Khanna, Sanjeev, Naor, Joseph
The problem of scheduling jobs with interval constraints is a well-studied classical scheduling problem. The input to the problem is a collection of n jobs where each job has a set of intervals on...
Asymmetric k-Center is log* n-Hard to Approximate (2004)
Chuzhoy, Julia, Guha, Sudipto, Halperin, Eran, Khanna, Sanjeev, Kortsarz, Guy, Krauthgamer, Robert, ...
In the Asymmetric k-Center problem, the input is an integer k and a complete digraph over n points together with a distance function obeying the directed triangle inequality. The goal is to choose a...
REHIST: Relative error histogram construction algorithms (2004)
Sudipto Guha, Kyuseok Shim, Jungchul Woo
Histograms and Wavelet synopses provide useful tools in query optimization and approximate query answering. Traditional histogram construction algorithms, such as V-Optimal, optimize absolute error...
XWAVE: Optimal and approximate extended wavelets for streaming data (2004)
Sudipto Guha, Chulyun Kim, Kyuseok Shim
Wavelet synopses have been found to be of interest in query optimization and approximate query answering. Recently, extended wavelets were proposed by Deligiannakis and Roussopoulos for data sets...
Merging the results of approximate match operations (2004)
Data Cleaning is an important process that has been at the center of research interest in recent years. An important end goal of effective data cleaning is to identify the relational tuple or tuples...
REHIST: Relative error histogram construction algorithms (2004)
Sudipto Guha, Kyuseok Shim, Jungchul Woo
Histograms and Wavelet synopses provide useful tools in query optimization and approximate query answering. Traditional histogram construction algorithms, such as V-Optimal, optimize absolute error...
XWAVE: Optimal and approximate extended wavelets for streaming data (2004)
Sudipto Guha, Chulyun Kim, Kyuseok Shim
Wavelet synopses have been found to be of interest in query optimization and approximate query answering. Recently, extended wavelets were proposed by Deligiannakis and Roussopoulos for data sets...
REHIST: Relative error histogram construction algorithms (2004)
Sudipto Guha, Kyuseok Shim, Jungchul Woo
Histograms and Wavelet synopses provide useful tools in query optimization and approximate query answering. Traditional histogram construction algorithms, such as V-Optimal, optimize absolute error...
Merging the Results of Approximate Match Operations (2004)
Sudipto Guha, Nick Koudas, Amit Marathe, Divesh Srivastava
Data Cleaning is an important process that has been at the center of research interest in recent years. An important end goal of effective data cleaning is to identify the relational tuple or tuples...
Merging the Results of Approximate Match Operations Sudipto Guha (2004)
Of Pennsylvania Sudipto, Sudipto Guha
Data Cleaning is an important process that has been at the center of research interest in recent years. An important end goal of effective data cleaning is to identify the relational tuple or tuples...
XWAVE: Optimal and approximate extended wavelets for streaming data (2004)
Sudipto Guha, Chulyun Kim, Kyuseok Shim
Wavelet synopses have been found to be of interest in query optimization and approximate query answering. Recently, extended wavelets were proposed by Deligiannakis and Roussopoulos for data sets...
Capacitated vertex covering (2003)
Sudipto Guha, Refael Hassin, B Samir Khuller, Einat Or B
In this paper we study the capacitated vertex cover problem, a generalization of the well-known vertex cover problem. Given a graph G = (V, E) with weights on the vertices, the goal is to cover all...
Clustering data streams: Theory and practice (2003)
Sudipto Guha, Adam Meyerson, Nina Mishra, Rajeev Motwani
Abstract—The data stream model has recently attracted attention for its applicability to numerous types of data, including telephone records, Web documents, and clickstreams. For analysis of such...
Efficient approximation of optimization queries under parametric aggregation constraints (2003)
Sudipto Guha, Divesh Srivastava
We introduce and study a new class of queries that we refer to as OPAC (optimization under parametric aggregation constraints) queries. Such queries aim to identify sets of database tuples that...
Compression of partially ordered strings (2003)
Rajeev Alur, Swarat Chaudhuri, Kousha Etessami, Sudipto Guha, Mihalis Yannakakis
Abstract. We introduce the problem of compressing partially ordered strings: given string σ ∈ Σ ∗ and a binary independence relation I over Σ, how can we compactly represent an input if the...
Efficient approximation of optimization queries under parametric aggregation constraints (2003)
Sudipto Guha, Dimitrios Gunopulos, Nick Koudas, Divesh Srivastava, Michail Vlachos
We introduce and study a new class of queries that we refer to as OPAC (optimization under parametric aggregation constraints) queries. Such queries aim to identify sets of database tuples that...
Application of the Two-Sided Depth Test to CSG Rendering (2003)
Sudipto Guha, Shankar Krishnan, Kamesh Munagala, Suresh Venkatasubramanian
Shadow mapping is a technique for doing real-time shadowing. Recent work has shown that shadow mapping hardware can be used as a second depth test in addition to the z-test. In this paper, we explore...
Efficient Approximation of Optimization Queries Under Parametric (2003)
Aggregation Constraints Sudipto, Sudipto Guha, Divesh Srivastava
We introduce and study a new class of queries that we refer to as OPAC (optimization under parametric aggregation constraints) queries. Such queries aim to identify sets of database tuples that...
Asymmetric k-Center is log* n-hard to Approximate (2003)
Julia Chuzhoy, Sudipto Guha, Sanjeev Khanna, Joseph (Seffi) Naor
We show that the asymmetric k-center problem is 34 n)-hard to approximate unless NP DTIME(n ). Since an O(log n)-approximation algorithm is known for this problem, this essentially resolves the...
Efficient Approximation of Optimization Queries Under Parametric (2003)
Aggregation Constraints Sudipto, Sudipto Guha, Divesh Srivastava
We introduce and study a new class of queries that we refer to as OPAC (optimization under parametric aggregation constraints) queries. Such queries aim to identify sets of database tuples that...
Asymmetric k-center is log ∗ n-hard to approximate (2003)
Julia Chuzhoy, Sudipto Guha, Eran Halperin, Sanjeev Khanna
In the Asymmetric k-Center problem, the input is an integer k and a complete digraph over n points together with a distance function obeying the directed triangle inequality. The goal is to choose a...
Index-based approximate XML joins (2003)
XML data integration tools are facing a variety of challenges for their efficient and effective operation. Among these is the requirement to handle a variety of inconsistencies or mistakes present in...
Compression of Partially Ordered Strings (2003)
Rajeev Alur Swarat, Rajeev Alur, Swarat Chaudhuri, Kousha Etessami, Sudipto Guha
We introduce the problem of compressing partially ordered strings: given string 2 and a binary independence relation I over , how can we compactly represent an input if the decompressor is allowed to...
Compression of Partially Ordered Strings (2003)
Rajeev Alur, Swarat Chaudhuri, Kousha Etessami, Sudipto Guha, Mihalis Yannakakis
We introduce the problem of compressing partially ordered strings: given string 2 and a binary independence relation I over , how can we compactly represent an input if the decompressor is allowed to...
Inferring mixtures of markov chains (2002)
Abstract. We define the problem of inferring a “mixture of Markov chains ” based on observing a stream of interleaved outputs from these chains. We show a sharp characterization of the inference...
Obtaining fast and good quality approximations to data distributions is a problem of central interest to database management. A variety of popular database applications including, approximate...
Improved algorithms for the data placement problem (2002)
We study the data placement problem [1, 3], where the goal is to place data objects in xed capacity caches in a network to optimize latency
Streaming-Data Algorithms for High-Quality Clustering (2002)
Nina Mishra, Adam Meyerson, Sudipto Guha, Rajeev Motwani
As data gathering grows easier, and as researchers discover new ways to interpret data, streamingdata algorithms have become essential in many fields. Data stream computation precludes algorithms...
Streaming-Data Algorithms for High-Quality Clustering (2002)
Liadan O'Callaghan, Nina Mishra, Sudipto Guha, Adam Meyerson, Rajeev Motwani
Streamingd ata analysis has recently attracted attention in numerous applicationsincludti telephonerecord s, webd ocuments and clickstreams. For such analysis, single-pass algorithms that consume a...
Fast, Small-Space Algorithms for Approximate Histogram Maintenance (Extended Abstract) (2002)
Anna Gilbert, Yannis Kotidis, Sudipto Guha, S. Muthukrishnan, Piotr Indyk, Martin J. Strauss
Anna C. Gilbert AT&T Labs---Research agilbert@research.att.com Yannis Kotidis AT&T Labs---Research kotidis@research.att.com Sudipto Guha CIS,University of Pennsylvania sudipto@cis.upenn.edu...
Fast, Small-Space Algorithms for Approximate Histogram (2002)
Maintenance Extend Ed, Anna C. Gilbert, Sudipto Guha, Piotr Indyk, Yannis Kotidis, S. Muthukrishnan, ...
Anna C. Gilbert AT&T Labs---Research agilbert@research.att.com Yannis Kotidis AT&T Labs---Research kotidis@research.att.com Sudipto Guha CIS,University of Pennsylvania sudipto@cis.upenn.edu...
Sudipto Guha, Divesh Srivastava
XML is widely recognized as the data interchange standard for tomorrow, because of its ability to represent data from a wide variety of sources. Hence, XML is likely to be the format through which...
Fast Algorithms for Hierarchical Range Histogram Construction (2002)
Sudipto Guha, Nick Koudas, Divesh Srivastava
Data Warehousing and OLAP applications typically view data as having multiple logical dimensions (e.g., product, location) with natural hierarchies defined on each dimension. OLAP queries usually...
Capacitated Vertex Covering with Applications (2002)
Sudipto Guha Refael, Sudipto Guha, Refael Hassin, Samir Khuller
In this paper we study the capacitated vertex cover problem, a generalization of the well known vertex problem. Given a graph G = (V; E) with weights on the vertices, the goal is to cover all the...
In generalized k-clustering, we are given n points in a metric space with distance function d. The goal is to partition the points into k clusters, C 1;C 2;:::;Ck so that for a given, the following...
A constant factor approximation for the single sink edge installation problems (2001)
Sudipto Guha, Adam Meyerson, Kamesh Munagala
We present the first constant approximation to the single sink buy-at-bulk network design problem, where we have to design a network by buying pipes of different costs and capacities per unit length...
A constant factor approximation for the single sink edge installation problems (2001)
Sudipto Guha, Adam Meyerson, Kamesh Munagala
We present the rst constant approximation to the single sink buy-at-bulk network design problem, where we have to design a network by buying pipes of dierent costs and capacities per unit length to...
Data-streams and histograms (2001)
Sudipto Guha, Nick Koudas, Kyuseok Shim
Histograms have been used widely to capture data distribution, to represent the data by a small number of step functions. Dynamic programming algorithms which provide optimal construction of these...
Improved algorithms for fault tolerant facility location (2001)
Sudipto Guha, Adam Meyerson, Kamesh Munagala
We consider a generalization of the classical facility location problem, where we require the solution to be fault-tolerant. Every demand point j is served by r j facilities instead of just one. The...
Improved algorithms for fault tolerant facility location (2001)
Sudipto Guha, Adam Meyerson, Kamesh Munagala
We consider a generalization of the classical facility location problem, where we require the solution to be fault-tolerant. Every demand point j is served by r j facilities instead of just one. The...
A constant factor approximation for the single sink edge installation problems (2001)
ABSTRACT We present the first constant approximation to the single sink buy-at-bulk network design problem, where we have to design a network by buying pipes of different costs and capacities per...
Streaming-Data Algorithms for High-Quality Clustering (2001)
Liadan O'Callaghan, Nina Mishra, Adam Meyerson, Sudipto Guha, Rajeev Motwani
As data gathering grows easier, and as researchers discover new ways to interpret data, streamingdata algorithms have become essential in many fields. Data stream computation precludes algorithms...
Approximation algorithms for facility location problems / (2000)
Submitted to the Department of Computer Science.
Hierarchical reliable multicast: Performance analysis and placement of proxies (2000)
Sudipto Guha, Athina Markopoulou, Fouad Tobagi
Abstract — This paper studies the use of multicast together with proxy nodes for reliably disseminating data from a single source to a large number of receivers. In order to achieve reliability,...
Hierarchical placement and network design problems (2000)
Sudipto Guha, Adam Meyersoný, Kamesh Munagalaþ
In this paper, we give the first constant-approximations for a number of layered network design problems. We begin by modeling hierarchical caching, where caches are placed in layers and each layer...
Hierarchical placement and network design problems (2000)
Sudipto Guha, Adam Meyerson, Kamesh Munagala
In this paper, we give the rst constant-approximations for a number of layered network design problems. We begin by modeling hierarchical caching, where caches are placed in layers and each layer...
Clustering Data Streams (2000)
Sudipto Guha, Nina Mishra, Rajeev Motwani
We study clustering under the data stream model of computation where: given a sequence of points, the objective is to maintain a consistently good clustering of the sequence observed so far, using a...
Hierarchical placement and network design problems (2000)
Sudipto Guha, Adam Meyerson, Kamesh Munagala
In this paper, we give the first constant-approximations for a number of layered network design problems. We begin by modeling hierarchical caching, where caches are placed in layers and each layer...
Improved Approximations of Crossings in Graph Drawings and VLSI Layout Areas (2000)
Guy Even, Sudipto Guha, Baruch Schieber
We give improved approximations for two classical embedding problems: (i) minimizing the number of crossings in a drawing of a bounded degree graph on the plane; and (ii) minimizing the VLSI layout...
Nested Graph Dissection and Approximation Algorithms (2000)
This paper considers approximation algorithms for graph completion problems using the nested dissection paradigm. Given a super-additive function of interest (the smallest planar or chordal extension...
A constant-factor approximation algorithm for the k-median problem (1999)
Moses Charikar, Sudipto Guha, Éva Tardos, David B. Shmoys
We present the first constant-factor approximation algorithm for the metric k-median problem. The k-median problem is one of the most well-studied clustering problems, i.e., those problems in which...
Improved Methods for Approximating Node Weighted Steiner Trees and Connected Dominating Sets (1999)
A greedy approximation algorithm based on \spider decompositions " was developed by Klein and Ravi for node weighted Steiner trees. This algorithm provides a worst case approximation ratio...
ROCK:ARobust Clustering Algorithm for Categorical Attributes (1999)
Sudipto Guha, Rajeev Rastogi, Kyuseok Shim
Abstract | Clustering, in data mining, is useful to discover distribution patterns in the underlying data. Clustering algorithms usually employ a distance metric based (e.g., euclidean) similarity...
Improved Methods for Approximating Node Weighted Steiner Trees and Connected Dominating Sets (1999)
A greedy approximation algorithm based on "spider decompositions " was developed by Klein and Ravi for node weighted Steiner trees. This algorithm provides a worst case...
ROCK: A robust clustering algorithm for categorical attributes (1999)
Sudipto Guha, Rajeev Rastogi, Kyuseok Shim
We study clustering algorithms for data with boolean and categorical attributes. We show that traditional clustering algorithms that use distances between points for clustering are not appropriate...
Improved Combinatorial Algorithms for the Facility Location and k-Median Problems (1999)
We present improved combinatorial approximation algorithms for the uncapacitated facility location and k-median problems. Two central ideas in most of our results are cost scaling and greedy...
A constant-factor approximation algorithm for the k-median problem (Extended Abstract) (1999)
Moses Charikar, Sudipto Guha, Éva Tardos, Eva Tardos, David B. Shmoys
) Moses Charikar Sudipto Guha y Eva Tardos z David B. Shmoys x Abstract We present the rst constant-factor approximation algorithm for the metric k-median problem. The k-median problem is one of the...
Improved Methods for Approximating Node Weighted Steiner Trees and Connected Dominating Sets (1999)
A greedy approximation algorithm based on "spider decompositions" was developed by Klein and Ravi for node weighted Steiner trees. This algorithm provides a worst case approximation ratio...
ROCK: A Robust Clustering Algorithm for Categorical Attributes (1999)
Sudipto Guha, Rajeev Rastogi, Kyuseok Shim
We study clustering algorithms for data with boolean and categorical attributes. We show that traditional clustering algorithms that use distances between points for clustering are not appropriate...
Improved combinatorial algorithms for the facility location and k-median problems (1999)
We present improved combinatorial approximation algorithms for the uncapacitated facility location problem. Two central ideas in most of our results are cost scaling and greedy improvement. We...
Approximation Algorithms for Directed Steiner Problems (1998)
Moses Charikar, Chandra Chekuri, To-yat Cheung, Zuo Dai, Ashish Goel, Sudipto Guha, ...
We give the first non-trivial approximation algorithms for the Steiner tree problem and the generalized Steiner network problem on general directed graphs. These problems have several applications in...
Message Multicasting In Heterogeneous Networks (1998)
Amotz Bar-Noy, Sudipto Guha, Joseph (Seffi) Naor, Baruch Schieber
In heterogeneous networks, sending messages may incur different delays on different links, and each node may have a different switching time between messages. The well studied Telephone model is...
Approximation Algorithms for Directed Steiner Problems (1998)
Moses Charikar, Chandra Chekuri, To-yat Cheung, Zuo Dai, Ashish Goel, Sudipto Guha, ...
We give the rst non-trivial approximation algorithms for the Steiner tree problem and the generalized Steiner network problem on general directed graphs. These problems have several applications in...
Moses Charikar, Chandra Chekuri, Ashish Goel, Sudipto Guha
Most optimization problems on an undirected graph reduce in complexity when restricted to instances on a tree. A recent result [3] for probabilistically approximating graph metrics by trees such that...
Approximation Algorithms for Directed Steiner Problems (1998)
Moses Charikar Stanford, Moses Charikar, Chandra Chekuri, Zuo Dai, Ashish Goel, Sudipto Guha, ...
We give the first non-trivial approximation algorithms for the Steiner tree problem and the generalized Steiner network problem on general directed graphs. These problems have several applications in...
Approximation Algorithms for Directed Steiner Problems (1998)
Moses Charikar, Chandra Chekuri, To-yat Cheung, Zuo Dai, Ashish Goel, Sudipto Guha, ...
We give the first non-trivial approximation algorithms for the Steiner tree problem and the generalized Steiner network problem on general directed graphs. These problems have several applications in...
Greedy strikes back: Improved Facility Location Algorithms (1998)
A fundamental facility location problem is to choose the location of facilities, such as industrial plants and warehouses, to minimize the cost of satisfying the demand for some commodity. There are...
Multicasting in Heterogeneous Networks (1998)
Amotz Bar-noy, Sudipto Guha, Joseph (Seffi) Naor, Baruch Schieber
In heterogeneous networks sending messages may incur different delays on different links, and each node may have a different switching time between messages. The well studied Telephone model is...
Greedy strikes back: Improved Facility Location Algorithms (1998)
A fundamental facility location problem is to choose the location of facilities, such as industrial plants and warehouses, to minimize the cost of satisfying the demand for some commodity. There are...
Facility Location with Dynamic Distance Functions (Extended Abstract) (1998)
Randeep Bhatia, Sudipto Guha, Samir Khuller, Yoram J. Sussmann
) Randeep Bhatia Sudipto Guha y Samir Khuller z Yoram J. Sussmann x Dept. of Computer Science University of Maryland College Park, MD 20742 Abstract Facility location problems have always been...
Improved Methods for Approximating Node Weighted Steiner Trees and Connected Dominating Sets (1997)
A greedy approximation algorithm based on ``spider decompositions'' was developed by Klein and Ravi for node weighted Steiner trees. This algorithm provides a worst case approximation ratio of $2 \ln...
Improved Methods for Approximating Node Weighted Steiner Trees and Connected Dominating Sets (1997)
A greedy approximation algorithm based on ``spider decompositions'' was developed by Klein and Ravi for node weighted Steiner trees. This algorithm provides a worst case approximation ratio of $2 \ln...
Facility Location with Dynamic Distance Functions (1997)
Bhatia, Randeep, Guha, Sudipto, Khuller, Samir, Sussmann, Yoram J.
Facility location problems have always been studied with the assumption that the edge lengths in the network are {\em static} and do not change over time. The underlying network could be used to...
Facility Location with Dynamic Distance Functions (1997)
Bhatia, Randeep, Guha, Sudipto, Khuller, Samir, Sussmann, Yoram J.
Facility location problems have always been studied with the assumption that the edge lengths in the network are {\em static} and do not change over time. The underlying network could be used to...
Clustering algorithm for categorical attributes (1997)
Sudipto Guha, Rajeev Rastogi, Kyuseok Shim
Clustering, in data mining, is useful to discover distribution patterns of the underlying data. Clustering algorithms usually employ a distance based (e.g., euclidean) similarity measure in order to...
Approximation Algorithms for Connected Dominating Sets (1996)
The dominating set problem in graphs asks for a minimum size subset of vertices with the following property: each vertex is required to either be in the dominating set, or adjacent to some node in...
Approximation Algorithms for Connected Dominating Sets (1996)
The dominating set problem in graphs asks for a minimum size subset of vertices with the following property: each vertex is required to either be in the dominating set, or adjacent to some node in...
Approximation Algorithms for Connected Dominating Sets (1996)
The dominating set problem in graphs asks for a minimum size subset of vertices with the following property: each vertex is required to either be in the dominating set, or adjacent to some node in...
Approximation Algorithms for Connected Dominating Sets (1996)
The dominating set problem in graphs asks for a minimum size subset of vertices with the following property: each vertex is required to be either in the dominating set, or adjacent to some vertex in...
Facility Location with Dynamic Distance Functions
Randeep Bhatia, Sudipto Guha, Samir Khuller, Y. J. Sussmann, Ding-zhu Du
. Facility location problems have always been studied with the assumption that the edge lengths in the network are static and do not change over time. The underlying network could be used to model a...
Facility Location with Dynamic Distance Functions
Randeep Bhatia, Sudipto Guha, Samir Khuller, Yoram J. Sussmann
Facility location problems have always been studied with the assumption that the edge lengths in the network are static and do not change over time. The underlying network could be used to model a...