Mehran Sahami

EDITORIAL ADVISORY BOARD: (2008)

Floriana Esposito, Chandrika Kamath, Huan Liu, Sach Mukherjee, Marco Ramoni, ...

newest imprint “Idea Group Reference ” (IGR), publishing premier reference publications on all aspects of emerging topics in information science, technology and management. The Encyclopedia of...

ABayesian Approach to Filtering Junk E-Mail (2008)

Mehran Sahami, Susan Dumais, David Heckerman, Eric Horvitz

In addressing the growing problem of junk E-mail on the Internet, we examine methods for the automated construction of lters to eliminate such unwanted messages from a user's mail stream. By...

CUCS-004-00 Automatic Classification of Text Databases Through Query Probing (2008)

Panagiotis G. Ipeirotis, Luis Gravano, Mehran Sahami

Many text databases on the web are “hidden ” behind search interfaces, and their documents are only accessible through querying. Search engines typically ignore the contents of such search-only...

Combinatorial Markov Random Fields (2008)

Ron Bekkerman, Mehran Sahami, Erik Learned-miller

Abstract. A combinatorial random variable is a discrete random variable defined over a combinatorial set (e.g., a power set of a given set). In this paper we introduce combinatorial Markov random...

Algorithms, Experimentation (2008)

Mehran Sahami

Determining the similarity of short text snippets, such as search queries, works poorly with traditional document similarity measures (e.g., cosine), since there are often few, if any, terms in...

Learning Classification Rules Using Lattices (Extended Abstract) (2008)

Mehran Sahami

Abstract. This paper presents a novel induction algorithm, Rulearner, which induces classification rules using a Galois lattice as an explicit map through the search space of rules. The Rulearner...

An Autonomous Mobile Robot Architecture Using Belief Networks and Neural Networks (2007)

Mehran Sahami, John Lilly, Bryan Rollins

This paper introduces a novel mobile robot architecture based on a Situated Belief Network, a belief network that is dynamically updated as a consequence of its current environment. We initially show...

Databases (2007)

Luis Gravano, Panagiotis G. Ipeirotis, Mehran Sahami

The contents of many valuable web-accessible databases are only available through search interfaces and are hence invisible to traditional web “crawlers. ” Recently, commercial web sites have...

Diego Jorquera Toward Optimal Feature Selection (2007)

Daphne Koller, Mehran Sahami, Diego Jorquera

Our context: supervised learning (or, more specifically, classification) Our motivation: reducing the dimensionality of our feature space (that is, reducing the number of features used)

Semi-supervised clustering using combinatorial MRFs (2006)

Ron Bekkerman, Mehran Sahami

A combinatorial random variable is a discrete random variable defined over a combinatorial set (e.g., a power set of a given set). In this paper we introduce combinatorial Markov random fields...

A web-based kernel function for matching short text snippets (2005)

Mehran Sahami, Tim Heilman

Determining the similarity of short text snippets, such as search queries, works poorly with traditional document similarity measures (e.g., cosine), since there are often few, if any, terms in...

The Happy Searcher: Challenges in Web Information Retrieval (2004)

Mehran Sahami, Vibhu Mittal, Shumeet Baluja, Henry Rowley, Google Inc

Search has arguably become the dominant paradigm for finding information on the World Wide Web. In order to build a successful search engine, there are a number of challenges that arise where...

Efficient face orientation discrimination (2004)

Shumeet Baluja, Mehran Sahami, Henry A. Rowley, Google Inc

This paper presents efficient methods to address the problem of discriminating between five facial orientations. We present the most efficient methods for this task to date, which can accurately...

QProber: A system for automatic classification of hidden-web databases (2003)

Panagiotis G. Ipeirotis, Luis Gravano, Mehran Sahami

The contents of many valuable web-accessible databases are only available through search interfaces and are hence invisible to traditional web “crawlers. ” Recently, commercial web sites have...

QProber: A System for Automatic Classification of Hidden-Web Resources (2001)

Ipeirotis, Panagiotis G., Gravano, Luis, Sahami, Mehran

The contents of many valuable web-accessible databases are only available through search interfaces and are hence invisible to traditional web ``crawlers.'' Recently, commercial web sites have...

Automatic Classification of Text Databases through Query Probing (2000)

Ipeirotis, Panagiotis, Gravano, Luis, Sahami, Mehran

Many text databases on the web are "hidden" behind search interfaces, and their documents are only accessible through querying. Search engines typically ignore the contents of such search-only...

Automatic Classification of Text Databases Through Query Probing (2000)

Ipeirotis, Panagiotis, Gravano, Luis, Sahami, Mehran

Many text databases on the web are 'hidden' behind search interfaces, and their documents are only accessible through querying. Search engines typically ignore the contents of such search-only...

Automatic Classification of Text Databases through Query Probing (2000)

Panagiotis Ipeirotis Computer, Panagiotis G. Ipeirotis, Luis Gravano, Mehran Sahami

Many text databases on the web are \hidden" behind search interfaces, and their documents are only accessible through querying. Search engines typically ignore the contents of such search-only...

Automatic Classification of Text Databases through Query Probing (2000)

Panagiotis G. Ipeirotis, Luis Gravano, Mehran Sahami

Many text databases on the web are "hidden" behind search interfaces, and their documents are only accessible through querying. Search engines typically ignore the contents of such...

Automatic Classification of Text Databases through Query Probing (2000)

Panagiotis G. Ipeirotis, Luis Gravano, Mehran Sahami

Many text databases on the web are hidden behind search interfaces, and their documents are only accessible through querying. Search engines typically ignore the contents of such search-only...

Using Probabilistic Relational Models for Collaborative Filtering (1999)

Lise Getoor, Mehran Sahami

Recent projects in collaborative filtering and information filtering address the task of inferring user preference relationships for products or information. The data on which these inferences are...

A Probabilistic Approach to Full-Text Document Clustering (1998)

Moises Goldszmidt, Mehran Sahami

To address the issue of text document clustering, a suitable function is needed for measuring the distance between documents. In this paper we explore a function for scoring document similarity based...

A Bayesian Approach to Filtering Junk E-Mail (1998)

Mehran Sahami, Susan Dumais, David Heckerman, Eric Horvitz

In addressing the growing problem of junk E-mail on the Internet, we examine methods for the automated construction of filters to eliminate such unwanted messages from a user's mail stream. By...

SONIA: A Service for Organizing Networked Information Autonomously (1998)

Mehran Sahami, Salim Yusufali

The recent explosion of on-line information in Digital Libraries and on the World Wide Web has given rise to a number of query-based search engines and manually constructed topical hierarchies....

Using Machine Learning To Improve Information Access (1998)

Mehran Sahami

this document retrieval stage will become even less of a time issue. CHAPTER 9. SONIA -- A COMPLETE SYSTEM 170 Clusterer Descriptor Extractor yes <...> <...> <...> <...> ......

SONIA: A Service for Organizing Networked (1998)

Information Autonomously Mehran, Mehran Sahami, Salim Yusufali

The recent explosion of on-line information in Digital Libraries and on the World Wide Web has given rise to a number of query-based search engines and manually constructed topical hierarchies....

Inductive Learning Algorithms and Representations for Text Categorization (1998)

Susan Dumais, John Platt, Mehran Sahami, David Heckerman

Text categorization – the assignment of natural language texts to one or more predefined categories based on their content – is an important component in many information organization and...

Sonia: A service for organizing networked information autonomously (1998)

Mehran Sahami, Salim Yusufali

The recent explosion of on-line information in Digital Libraries and on the World Wide Web has given rise to a number of query-based search engines and manually constructed topical hierarchies....

Hierarchically classifying documents using very few words (1997)

Daphne Koller, Mehran Sahami

The proliferation of topic hierarchies for text documents has resulted in a need for tools that automatically classify new documents within such hierarchies. Existing classification schemes which...

Hierarchically Classifying Documents Using Very Few Words (1997)

Daphne Koller, Mehran Sahami

The proliferation of topic hierarchies for text documents has resulted in a need for tools that automatically classify new documents within such hierarchies. Existing classification schemes which...

Lazy Acquisition of Place Knowledge (1997)

Pat Langley, Karl Pfleger, Mehran Sahami

In this paper we define the task of place learning and describe one approach to this problem. Our framework represents distinct places as evidence grids, a probabilistic description of occupancy....

Hierarchically Classifying Documents Using Very Few Words (1997)

Daphne Koller, Mehran Sahami

The proliferation of topic hierarchies for text documents has resulted in a need for tools that automatically classify new documents within such hierarchies. One can use existing classifiers by...

Toward Optimal Feature Selection (1996)

Daphne Koller, Mehran Sahami

In this paper, we examine a method for feature subset selection based on Information Theory. Initially, a framework for defining the theoretically optimal, but computationally intractable, method for...

Toward Optimal Feature Selection (1996)

Daphne Koller, Mehran Sahami

In this paper, we examine a method for feature subset selection based on Information Theory. Initially, a framework for defining the theoretically optimal, but computationally intractable, method for...

Error-Based and Entropy-Based Discretization of Continuous Features (1996)

Ron Kohavi, Mehran Sahami

We present a comparison of error-based and entropybased methods for discretization of continuous features. Our study includes both an extensive empirical comparison as well as an analysis of...

Applying the Multiple Cause Mixture Model to Text Categorization (1996)

Mehran Sahami, Marti Hearst, Eric Saund

This paper introduces the use of the Multiple Cause Mixture Model to automatic text category assignment. Although much research has been done on text categorization, this algorithm is novel in that...

Learning Limited Dependence Bayesian Classifiers (1996)

Mehran Sahami

We present a framework for characterizing Bayesian classification methods. This framework can be thought of as a spectrum of allowable dependence in a given probabilistic model with the Naive Bayes...

Toward Optimal Feature Selection (1996)

Daphne Koller, Mehran Sahami

In this paper, we examine a method for feature subset selection based on Information Theory. Initially, a framework for defining the theoretically optimal, but computationally intractable, method for...

Applying the Multiple Cause Mixture Model to Text Categorization (1996)

Mehran Sahami, Marti Hearst, Eric Saund

This paper introduces the use of the Multiple Cause Mixture Model to automatic text category assignment. Although much research has been done on text categorization, this algorithm is novel in that...

Supervised and Unsupervised Discretization of Continuous Features (1995)

James Dougherty, Ron Kohavi, Mehran Sahami

Many supervised machine learning algorithms require a discrete feature space. In this paper, we review previous work on continuous feature discretization, identify defining characteristics of the...

Generating Neural Networks through the Induction of Threshold Logic Unit Trees (Extended Abstract) (1995)

Mehran Sahami

) Mehran Sahami Computer Science Department, Stanford University, Stanford, CA 94305, USA Email: sahami@CS.Stanford.EDU Abstract. We investigate the generation of neural networks through the...

Supervised and Unsupervised Discretization of Continuous Features (1995)

James Dougherty, Ron Kohavi, Mehran Sahami

Many supervised machine learning algorithms require a discrete feature space. In this paper, we review previous work on continuous feature discretization, identify defining characteristics of the...

Learning Classification Rules Using Lattices (Extended Abstract) (1995)

Mehran Sahami

) Mehran Sahami Computer Science Department, Stanford University, Stanford, CA 94305, USA Email: sahami@CS.Stanford.EDU Abstract. This paper presents a novel induction algorithm, Rulearner, which...

Learning Classification Rules Using Lattices (1995)

Mehran Sahami

This paper presents a novel induction algorithm, Rulearner, which induces classification rules using a Galois lattice as an explicit map through the search space of rules. The construction of...

Generating Neural Networks Through the Induction of Threshold Logic Unit Trees (1995)

Mehran Sahami

This paper investigates the generation of neural networks through the induction of binary trees of threshold logic units (TLUs). Initially, we describe the framework for our tree construction...

Generating Neural Networks Through the Induction of Threshold Logic Unit Trees (Extended Abstract (1995)

Mehran Sahami

Abstract. We investigate the generation of neural networks through the induction of binary trees of threshold logic units (TLUs). Initially, we describe the framework for our tree construction...

Supervised and unsupervised discretization of continuous features (1995)

James Dougherty, Ron Kohavi, Mehran Sahami

Many supervised machine learning algorithms require a discrete feature space. In this paper, we review previous work on continuous feature discretization, identify de ning characteristics of the...

Learning Non-Linearly Separable Boolean Functions With Linear Threshold Unit Trees and Madaline-Style Networks

Mehran Sahami

This paper investigates an algorithm for the construction of decisions trees comprised of linear threshold units and also presents a novel algorithm for the learning of nonlinearly separable boolean...