Insecure Context Switching: Inoculating regular expressions for survivability (2009)
Will Drewry, Tavis Orm, Google Inc
For most computer end–users, web browsers and Internet services act as the providers and protectors of their personal information, from bank accounts to personal correspondence. These systems are...
Lightweight, High-Resolution Monitoring for Troubleshooting Production Systems Abhishek Kumar (2009)
Sapan Bhatia, Google Inc, Marc E. Fiuczynski, Larry Peterson
Production systems are commonly plagued by intermittent problems that are difficult to diagnose. This paper describes a new diagnostic tool, called Chopstix, that continuously collects profiles of...
Names and Similarities on the Web: Fact Extraction in the Fast Lane Marius Pas¸ca (2009)
Google Inc, Dekang Lin, Jeffrey Bigham, Andrei Lifchits, Alpa Jain
In a new approach to large-scale extraction of facts from unstructured text, distributional similarities become an integral part of both the iterative acquisition of high-coverage contextual...
ESoftCheck: Removal of Non-vital Checks for Fault Tolerance (2009)
Jing Yu, Google Inc, María Jesús Garzarán, Marc Snir
Abstract—As semiconductor technology scales into the deep submicron regime the occurrence of transient or soft errors will increase. This will require new approaches to error detection. Software...
Enhance MapReducetowork betteracrossdatacenters. (2009)
Phdcandidate Officephone, Newyork Ny, Summer Softwareengineerinternship, Google Inc
B.S.(first classhonors) inComputer Science,2001-2005
Jeannie Albrecht, Google Inc, David A. Patterson
We describe the design and implementation of SWORD, a scalable resource discovery service for wide-area distributed systems. In contrast to previous systems, SWORD allows users to describe desired...
Towards Practical Biometric Key Generation with Randomized Biometric Templates (2009)
Lucas Ballard, Google Inc, Seny Kamara, Michael K. Reiter, Fabian Monrose
Although biometrics have garnered significant interest as a source of entropy for cryptographic key generation, recent studies indicate that many biometric modalities may not actually offer enough...
Gagan Aggarwal, Krishnaram Kenthapadi, Rina Panigrahy, An Zhu, Google Inc
or classroom use provided that the copies are not made or distributed for profit or commercial advantage, the ACM copyright/server notice, the title of the publication, and its date appear, and...
Anastasia Ailamaki, Charles Garrod, Christopher Olston, Bruce Maggs, Amit Manjhi, Google Inc, ...
The backend database system is often the performance bottleneck when running web applications. A common approach to scale the database component is query result caching, but it faces the challenge of...
Learning to Parse Video into Stable Spatiotemporal Volumes 1 (2009)
We are interested in learning how to exploit continuity, motion and context to account for stable, recoverable, spatiotemporal phenomena embedded in video. While most humans can make sense of still...
Detail Preserving Shape Deformation in Image Editing Hui Fang ∗ (2009)
Figure 1: The deformation of a source image (a), described by tracing and moving feature curves (b, top), can unrealistically stretch texture details, e.g. the ear fur (b, bottom). We preserve...
Abstract Data Management for Internet-Scale Single-Sign-On (2008)
Google offers a variety of Internet services that require user authentication. These services rely on a single-sign-on service, called Google Accounts, that has been in active deployment since 2002....
ABSTRACT Session Viewer: Visual Exploratory Analysis of Web Session Logs (2008)
Heidi Lam, Daniel Russell, Google Inc
Large-scale session log analysis typically includes statistical methods and detailed log examinations. While both methods have merits, statistical methods can miss previously unknown subpopulations...
Compact Dictionaries for Variable-Length Keys and Data, with Applications (2008)
Daniel K. Blandford, Google Inc, Guy E. Blelloch
We consider the problem of maintaining a dynamic dictionary T of keys and associated data for which both the keys and data are bit strings that can vary in length from zero up to the length w of a...
Building and Refining Rhetorical-Semantic Relation Models (2008)
Sasha Blair-goldensohn, Google Inc
We report results of experiments which build and refine models of rhetoricalsemantic relations such as Cause and Contrast. We adopt the approach of Marcu and Echihabi (2002), using a small set of...
Building MEMS-Based Storage Systems for Streaming Media (2008)
Raju Rangaswami, Zoran Dimitrijević, Edward Chang, Google Inc, Klaus Schauser
The performance of streaming media servers has been limited by the dual requirements of high disk throughput (to service more clients simultaneously) and low memory use (to decrease system cost). To...
TOLB: A Traffic-Oblivious Load-Balancing Protocol for Next-Generation Sensornets (2008)
Mohamed Aly, Ha Gopalan, Google Inc
Abstract. The multiple expected sources of traffic skewness in Next-Generation SensorNets (NGSN) will trigger the need for load-balanced point-to-point routing protocols. Driven by this fact, we...
POUR L’OBTENTION DU GRADE DE DOCTEUR ÈS SCIENCES PAR (2008)
Th Èse N, Présentée A La, Faculté De, Informatique Et Communication, Google Inc
de nationalité suisse acceptée sur proposition du jury: Prof. Karl ABERER (EPFL), directeur de thése Dr. Avigdor GAL (Technion), rapporteur
Replication Degree Customization for High Availability ∗ ABSTRACT (2008)
Ming Zhong, Google Inc, Kai Shen, Joel Seiferas
Object replication is a common approach to enhance the availability of distributed data-intensive services and storage systems. Many such systems are known to have highly skewed object request...
On Suspending and Resuming Dataflows ∗ (2008)
Badrish Chandramouli, Christopher N. Bond, Google Inc
Consider a long-running, resource-intensive query Q running
Università di Roma “La Sapienza” (2008)
J. K Önemann, M. P Ál, Google Inc
In the multicommodity rent-or-buy (MROB) network design problem we are given a network together with a set of k terminal pairs (s1, t1),..., (sk, tk). The goal is to provision the network so that a...
Semantic Vector Products: Some Initial Investigations (2008)
Semantic vector models have proven their worth in a number of natural language applications whose goals can be accomplished by modelling individual semantic concepts and measuring similarities...
Building and Refining Rhetorical-Semantic Relation Models (2008)
Sasha Blair-goldensohn, Google Inc
We report results of experiments which build and refine models of rhetoricalsemantic relations such as Cause and Contrast. We adopt the approach of Marcu and Echihabi (2002), using a small set of...
Roberto J. Bayardo, Google Inc
Given a large collection of sparse vector data in a high dimensional space, we investigate the problem of finding all pairs of vectors whose similarity score (as determined by a function such as...
Michal Szymaniak † Practical Large-Scale Latency Estimation (2008)
David Presotto, Guillaume Pierre, Google Inc
Maarten van Steen † We present the implementation of a large-scale latency estimation system based on GNP and incorporated into the Google content delivery network. Our implementation does not rely...
Improving Word Alignment with Bridge Languages (2008)
Shankar Kumar, Franz Och, Wolfgang Macherey, Google Inc
We describe an approach to improve Statistical Machine Translation (SMT) performance using multi-lingual, parallel, sentence-aligned corpora in several bridge languages. Our approach consists of a...
0-262-63304-3, $40.00, £25.95 Reviewed by (2008)
With goals as intuitive and desirable as they are challenging, the field of automated question answering has generated growing interest in the past few years. The increased momentum is apparent in...
Education and Employment (2008)
Martin Pál, Google Inc, Éva Tardos
Search advertising auction dynamics, auction design and bidding strategies, game theory, approximation
Education Selected Achievements Experience (2008)
⋄ Industry position offers from Google, Yahoo!, Microsoft, Oracle, A9 (Amazon) and Goldman Sachs
Abstract Analysis of a Very Large Web Search Engine Query Log Craig Silverstein (2008)
In this paper we present an analysis of an AltaVista Search Engine query log consisting of approximately 1 billion en-tries for search requests over a period of six weeks. This represents almost 285...
ABSTRACT Stochastic Models for Budget Optimization in Search-Based Advertising (2008)
Internet search companies sell advertisement slots based on users ’ search queries via an auction. Advertisers have to solve a complex optimization problem of how to place bids on the keywords of...
How does a search engine company decide what ads to display with each query so as to maximize its revenue? This turns out to be a generalization of the online bipartite matching problem. We introduce...
ABSTRACT Session Viewer: Visual Exploratory Analysis of Web Session Logs (2008)
Heidi Lam, Daniel Russell, Google Inc
Large-scale session log analysis typically includes statistical methods and detailed log examinations. While both methods have merits, statistical methods can miss previously unknown subpopulations...
Martin Casado, Pei Cao, Aditya Akella, Neils Provos, Google Inc
Distributed Denial-of-Service flooding attacks against public web servers are increasingly common. Websites without the ability to over-provision or rely on a CDN are often overwhelmed by such...
Anupam Gupta, Martin P Ál, Google Inc, Tim Roughgarden
We present constant-factor approximation algorithms for several widely-studied NP-hard optimization problems in network design, including the multicommodity rent-or-buy, virtual private network...
WWW 2007 / Track: Data Mining Session: Similarity Search ABSTRACT (2008)
Roberto J. Bayardo, Google Inc
Given a large collection of sparse vector data in a high dimensional space, we investigate the problem of finding all pairs of vectors whose similarity score (as determined by a function such as...
Inferring Complex Agent Motions from Partial Trajectory Observations Finnegan Southey (2008)
Tracking the movements of a target based on limited observations plays a role in many interesting applications. Existing probabilistic tracking techniques have shown considerable success but the...
Retrieval—User profiles and alert services (2008)
Major search engines currently use the history of a user’s actions (e.g., queries, clicks) to personalize search results. In this paper, we present a new personalized service, query-specific web...
What You Seek is What You Get: Extraction of Class Attributes from Query Logs (2008)
Within the larger area of automatic acquisition of knowledge from the Web, we introduce a method for extracting relevant attributes, or quantifiable properties, for various classes of objects. The...
Gurmeet Singh Manku, Google Inc
Near-duplicate web documents are abundant. Two such documents differ from each other in a very small portion that displays advertisements, for example. Such differences are irrelevant for web search....
Leonidas Kontothanassis, Robert Stets, Google Inc, Galen Hunt, Vmware Inc, Sandhya Dwarkadas, ...
Cashmere is a software distributed shared memory (S-DSM) system designed for clusters of serverclass machines. It is distinguished from most other S-DSM projects by (1) the effective use of fast...
ABSTRACT Google News Personalization: Scalable Online Collaborative Filtering (2008)
Several approaches to collaborative filtering have been studied but seldom have studies been reported for large (several million users and items) and dynamic (the underlying item set is continually...
Martin Casado, Pei Cao, Aditya Akella, Neils Provos, Google Inc
Distributed Denial-of-Service flooding attacks against public web servers are increasingly common. Websites without the ability to over-provision or rely on a CDN are often overwhelmed by such...
Chemistry and Chemical Biology (2008)
Aynur Dayanik, Google Inc, Rose Oughtred
In this paper, we consider the problem of finding the MEDLINE articles that describe functions of particular genes. We describe our experiments using the mg system and the partitioning of a graph of...
General Terms Algorithms, Experimentation (2008)
Challenging the implicit reliance on document collections, this paper discusses the pros and cons of using query logs rather than document collections, as self-contained sources of data in textual...
DRAFT All Your iFRAMEs Point to Us (2008)
Niels Provos, Panayiotis Mavrommatis, Moheeb Abu, Rajab Fabian Monrose, Google Inc, Niels Provos, ...
As the web continues to play an ever increasing role in information exchange, so too is it becoming the prevailing platform for infecting vulnerable hosts. In this paper, we provide a detailed study...
Robust submodular observation selection (2008)
Andreas Krause, Carlos Guestrin, H. Brendan Mcmahan, Google Inc, Anupam Gupta
In many applications, one has to actively select among a set of expensive observations before making an informed decision. For example, in environmental monitoring, we want to select locations to...
Of State, Attorneys General, Of The, United States, Google Inc, Idology Inc
IAC ikeepsafe
All Your iFrames Point To Us (2008)
Niels Provos, Panayiotis Mavrommatis, Google Inc, Moheeb Abu, Rajab Fabian Monrose
As the web continues to play an ever increasing role in information exchange, so too is it becoming the prevailing platform for infecting vulnerable hosts. In this paper, we provide a detailed study...
Ashish Goel, Monika R. Henzinger, Google Inc, Serge Plotkin, Eva Tardos
In this paper we consider the online ftp problem. The goal is to service a sequence of file transfer requests given bandwidth constraints of the underlying communication network. The main result of...
Operating System I/O Speculation: How two invocations are faster than one (2007)
Keir Fraser, Fay Chang, Google Inc
We present an in-kernel disk prefetcher which uses speculative execution to determine what data an application is likely to require in the near future. By placing our design within the operating...
The anatomy of clickbot.a (2007)
Neil Daswani, Security Teams, Google Inc
This paper provides a detailed case study of the architecture of the Clickbot.A botnet that attempted a low-noise click fraud attack against syndicated search engines. The botnet of over 100,000...
Web-scale Data Integration: You Can Only Afford to Pay As You Go (2007)
Jayant Madhavan, Shawn R. Jeffery, Shirley Cohen, Xin (luna Dong, David Ko, Cong Yu, ...
The World Wide Web is witnessing an increase in the amount of structured content – vast heterogeneous collections of structured data are on the rise due to the Deep Web, annotation schemes like...
• SOREN: NEED TO CHANGE FUNCTION APPLI- CATION SYNTAX TO OPERAND-FIRST EVERY- WHERE (2007)
Soren B. Lassen, Google Inc, A Def, B Def, Paul Blain Levy
This note uses the normal form bisimulation theory for recursively typed call-by-push-value (CBPV) [1] to prove a “syntactic minimal invariance ” result.
A Usability Study of Doppelganger, A Tool for Better Browser Privacy (2007)
Chris K. Karlof, Umesh Shankar, All Rights Reserved, Chris Karlof, Umesh Shankar, Google Inc
Copyright © 2007, by the author(s).
Web-scale Data Integration: You Can Only Afford to Pay As You Go (2007)
Jayant Madhavan, Shawn R. Jeffery, Shirley Cohen, Xin (luna Dong, David Ko, Cong Yu, ...
The World Wide Web is witnessing an increase in the amount of structured content – vast heterogeneous collections of structured data are on the rise due to the Deep Web, annotation schemes like...
Selecting observations against adversarial objectives (2007)
Andreas Krause, H. Brendan Mcmahan, Google Inc, Carlos Guestrin, Anupam Gupta
In many applications, one has to actively select among a set of expensive observations before making an informed decision. Often, we want to select observations which perform well when evaluated with...
Large language models in machine translation (2007)
Thorsten Brants, Ashok C. Popat, Peng Xu, Franz J. Och, Jeffrey Dean, Google Inc
This paper reports on the benefits of largescale statistical language modeling in machine translation. A distributed infrastructure is proposed which we use to train on up to 2 trillion tokens,...
The ghost in the browser: Analysis of web-based malware (2007)
Niels Provos, Dean Mcnamee, Panayiotis Mavrommatis, Ke Wang, Nagendra Modadugu, Google Inc
As more users are connected to the Internet and conduct their daily activities electronically, computer users have become the target of an underground economy that infects hosts with malware or...
This paper presents an empirical study on how different selections of input translation systems affect translation quality in system combination. We give empirical evidence that the systems to be...
Query Suspend And Resume (2007)
Badrish Chandramouli, Christopher N. Bond, Google Inc
Suppose a long-running analytical query is executing on a database server and has been allocated a large amount of physical memory. A high-priority task comes in and we need to run it immediately...
Selecting observations against adversarial objectives (2007)
Andreas Krause, Carlos Guestrin, H. Brendan Mcmahan, Google Inc, Anupam Gupta
In many applications, one has to actively select among a set of expensive observations before making an informed decision. Often, we want to select observations which perform well when evaluated with...
Detail preserving shape deformation in image editing (2007)
Figure 1: The deformation of a source image (a), described by tracing and moving feature curves (b, top), can unrealistically stretch texture details, e.g. the ear fur (b, bottom). We preserve...
A PLIABLE HYBRID ARCHITECTURE FOR CODE ISOLATION Approved by: (2007)
B. Ganev, Karsten Schwan Adviser, Mustaque Ahamad, Greg Eisenhauer, Santosh Pande, Kiran Panesar, ...
iii
Selecting observations against adversarial objectives (2007)
Andreas Krause, H. Brendan Mcmahan, Google Inc, Carlos Guestrin, Anupam Gupta
In many applications, one has to actively select among a set of expensive observations before making an informed decision. Often, we want to select observations which perform well when evaluated with...
Thorsten Joachims, Laura Granka, Google Inc, Bing Pan, Helene Hembrooke, Filip Radlinski, ...
This paper examines the reliability of implicit feedback generated from clickthrough data and query reformulations in WWW search. Analyzing the users ’ decision process using eyetracking and...
Indexing shared content in information retrieval systems (2006)
Andrei Z. Broder, Nadav Eiron, Marcus Fontoura, Michael Herscovici, Ronny Lempel, John Mcpherson, ...
Abstract. Modern document collections often contain groups of documents with overlapping or shared content. However, most information retrieval systems process each document separately, causing...
Principles of dataspace systems (2006)
Alon Halevy, Google Inc, Michael Franklin
The most acute information management challenges today stem from organizations relying on a large number of diverse, interrelated data sources, but having no means of managing them in a convenient,...
Niels Provos, Google Inc, Joe Mcclain, Google Inc, Ke Wang
Worms are becoming more virulent at the same time as operating system improvements try to contain them. Recent research demonstrates several effective methods to detect and prevent randomly scanning...
Modular Software Upgrades for Distributed Systems (2006)
Sameer Ajmani Barbara, Barbara Liskov, Liuba Shrira, Google Inc
Upgrading the software of long-lived, highly-available distributed systems is di#cult. It is not possible to upgrade all the nodes in a system at once, since some nodes may be unavailable and halting...
Modular Software Upgrades for Distributed Systems (2006)
Sameer Ajmani, Barbara Liskov, Liuba Shrira, Google Inc
Abstract. Upgrading the software of long-lived, highly-available distributed systems is difficult. It is not possible to upgrade all the nodes in a system at once, since some nodes may be unavailable...
Gagan Aggarwal, Google Inc, Ashish Goel, Rajeev Motwani
We present a truthful auction for pricing advertising slots on a web-page assuming that advertisements for different merchants must be ranked in decreasing order of their (weighted) bids. This...
Data integration: The teenage years (2006)
Data integration is a pervasive challenge faced in applications that need to query across multiple autonomous and heterogeneous data sources. Data integration is crucial in large enterprises that own...
Niels Provos, Joe Mcclain, Google Inc, Google Inc, Ke Wang
Worms are becoming more virulent at the same time as operating system improvements try to contain them. Recent research demonstrates several effective methods to detect and prevent randomly scanning...
Structured Data Meets the Web: A Few Observations (2006)
Jayant Madhavan, Alon Halevy, Shirley Cohen, Xin (luna Dong, Shawn R. Jeffery, David Ko, ...
The World Wide Web is witnessing an increase in the amount of structured content – vast heterogeneous collections of structured data are on the rise due to the Deep Web, annotation schemes like...
Approved by: THE DESIGN AND ANALYSIS OF LARGE DISPLAY (2006)
M. Huang, D. Mynatt, D. Abowd, Dr. W. Keith Edwards, ...
I cannot pick just one. For Joe, Khai, and my family, in gratitude for the love and support they have given me ACKNOWLEDGEMENTS 'for·tu·nate: 1. bringing some good thing not foreseen as...
Data integration: The teenage years (2006)
Data integration is a pervasive challenge faced in applications that need to query across multiple autonomous and heterogeneous data sources. Data integration is crucial in large enterprises that own...
Large scale performance measurement of content-based automated image-orientation detection (2005)
Abstract – With the proliferation of digital cameras and self-publishing of photos, automatic detection of image orientation will become an important part of photo management systems. In this...
e-nexsh: Achieving an effectively non-executable stack and heap via system-call policing (2005)
We present e-NeXSh, a novel security approach that utilises kernel and LIBC support for efficiently defending systems against process-subversion attacks. Such attacks exploit vulnerabilities in...
Database-Aware Semantically-Smart Storage (2005)
Muthian Sivathanu, Lakshmi N. Bairavasundaram, Google Inc
\Lambda
Interpreting the data: Parallel analysis with Sawzall (2005)
Rob Pike, Sean Dorward, Robert Griesemer, Sean Quinlan, Google Inc
Very large data sets often have a flat but regular structure and span multiple disks and machines. Examples include telephone call records, network logs, and web document repositories. These large...
Interpreting the data: Parallel analysis with Sawzall (2005)
Rob Pike, Sean Dorward, Robert Griesemer, Sean Quinlan, Google Inc
(Draft submitted to Scientific Programming Journal) Very large data sets often have a flat but regular structure and span multiple disks and machines. Examples include telephone call records, network...
Large margin methods for structured and interdependent output variables (2005)
Ioannis Tsochantaridis, Google Inc, Thorsten Joachims, Thomas Hofmann, Yasemin Altun, Yoram Singer
Learning general functional dependencies between arbitrary input and output spaces is one of the key challenges in computational intelligence. While recent progress in machine learning has mainly...
Towards integrated PSEs for wireless communications: experiences with (2004)
Roger R. Skidmore, Alex Verstak, Naren Ramakrishnan, Layne T. Watson, Jian He, Srinidhi Varadarajan, ...
This paper describes the computational methodologies of two problem solving environments (PSEs) for wireless network design and analysis, one (S § academic W) and one commercial (SitePlanner R ¨)....
Approximate reasoning for real-time probabilistic processes (2004)
Vineet Gupta, Google Inc, Radha Jagadeesan, Prakash Panangaden
We develop a pseudo-metric analogue of bisimulation for generalized semi-Markov processes. The kernel of this pseudo-metric corresponds to bisimulation; thus we have extended bisimulation for...
Approximate reasoning for real-time probabilistic processes (2004)
We develop a pseudo-metric analogue of bisimulation for generalized semi-Markov processes. The kernel of this pseudo-metric corresponds to bisimulation; thus we have extended bisimulation for...
Natural Language Processing in Information Retrieval (2004)
Many Natural Language Processing (NLP) techniques have been used in Information Retrieval. The results are not encouraging. Simple methods (stopwording, porter-style stemming, etc.) usually yield...
The Happy Searcher: Challenges in Web Information Retrieval (2004)
Mehran Sahami, Vibhu Mittal, Shumeet Baluja, Henry Rowley, Google Inc
Search has arguably become the dominant paradigm for finding information on the World Wide Web. In order to build a successful search engine, there are a number of challenges that arise where...
Efficient face orientation discrimination (2004)
Shumeet Baluja, Mehran Sahami, Henry A. Rowley, Google Inc
This paper presents efficient methods to address the problem of discriminating between five facial orientations. We present the most efficient methods for this task to date, which can accurately...
Searching the Web by Voice (2003)
Alexander Franz Google, Alexander Franz, Google Inc
Spoken queries are a natural medium for searching the Web in settings where typing on a keyboard is not practical. This paper describes a speech interface to the Google search engine. We present...
Information Incorporation in Online In-Game Sports Betting Markets (2003)
Sandip Debnath, David M. Pennock, C. Lee Giles, Steve Lawrence, Google Inc
We analyze data from 52 online in-game sports betting markets (where betting is allowed continuously throughout a game), including 34 markets based on soccer (European football) games from the 2002...
Yoelle Maarek, Aya Soffer, Bay-wei Chang, Google Inc
The dramatic increase in the use and availability of mobile devices such as cellular phones and Personal Digital Assistants (PDAs) in the last few years has resulted in the ability to access...
Modern information retrieval: a brief overview (2001)
For thousands of years people have realized the importance of archiving and finding information. With the advent of computers, it became possible to store large amounts of information; and finding...
Modern information retrieval: a brief overview (2001)
For thousands of years people have realized the importance of archiving and finding information. With the advent of computers, it became possible to store large amounts of information; and finding...
Modern information retrieval: a brief overview (2001)
For thousands of years people have realized the importance of archiving and finding information. With the advent of computers, it became possible to store large amounts of information; and finding...
Modern information retrieval: a brief overview (2001)
For thousands of years people have realized the importance of archiving and finding information. With the advent of computers, it became possible to store large amounts of information; and finding...
Online throughput-competitive algorithm for multicast routing and admission control (1998)
Ashish Goel, Monika R. Henzinger, Google Inc, Serge Plotkin
We present the first polylog-competitive online algorithm for the general multicast admission control and routing problem in the throughput model. The ratio of the number of requests accepted by the...
Zoran Dimitrijević, Google Inc, Advisor Prof, Edward Y. Chang
Design and implementation of large-scale storage systems, quality of service, parallel and cluster-based computing, multimedia systems, and large-scale search engines.