Padhraic Smyth

Graphical Models for Statistical Inference and Data Assimilation (2009)

Andrew W. Robertson, Padhraic Smyth

In data assimilation for a system which evolves in time, one combines past and current observations with a model of the dynamics of the system, in order to improve the simulation of the system as...

MODELING COUNT DATA FROM MULTIPLE SENSORS: A BUILDING OCCUPANCY MODEL (2009)

Jon Hutchins, Er Ihler, Padhraic Smyth

Knowledge of the number of people in a building at a given time is crucial for applications such as emergency response. Sensors can be used to gather noisy measurements which when combined, can be...

Unsupervised Statistical Learning. In: 7th International Semantic Web Conference. Modeling Documents by Combining Semantic Concepts with Unsupervised Statistical Learning (2009)

Chaitanya Chemudugunta, America Holloway, Padhraic Smyth

Abstract. Human-defined concepts are fundamental building-blocks in constructing knowledge bases such as ontologies. Statistical learning techniques provide an alternative automated approach to...

Conference on Information and Knowledge Management. Combining Concept Hierarchies and Statistical Topic Models (2009)

Padhraic Smyth, Mark Steyvers

Statistical topic models provide a general data-driven framework for automated discovery of high-level knowledge from large collections of text documents. While topic models can potentially discover...

DATA RECOVERY EXTENDS TRADITIONAL (2009)

C Fl, Boris Mirkin, John Lafferty, David Madigan, Fionn Murtagh, Padhraic Smyth

Often considered more as an art than a science, the field of clustering has been dominated by learning through examples and by choosing techniques almost through trial-and-error. Even the most...

Asynchronous Distributed Learning of Topic Models (2009)

Arthur Asuncion, Padhraic Smyth, Max Welling

Distributed learning is a problem of fundamental interest in machine learning and cognitive science. In this paper, we present asynchronous distributed learning algorithms for two well-known...

List of Supported Students: (2009)

Padhraic Smyth, Igor Cadez, Dasha Chudova, Xianping Ge

probabilistic learning, sequence and time series analysis, clustering, density estimation,

James Bennett (2009)

Padhraic Smyth

The KDD Cup is the oldest of the many data mining competitions that are now popular [1]. It is an integral part of the annual ACM

Infinite Mixtures of Trees (2008)

Sergey Kirshner, Padhraic Smyth

Finite mixtures of tree-structured distributions have been shown to be efficient and effective in modeling multivariate distributions. Using Dirichlet processes, we extend this approach to allow...

“f., “1.. Discrete Recurrent Neural Networks for Grammatical Inference (2008)

Zheng Zeng, Rodney M. Goodman, Padhraic Smyth

Recurrent neura] networks have rxxcnt]y becII shown to have the ability to learn regular and context-free gralnmam froni examples. We stlow that wllilc conventional analog recurrent networks try to...

Segmental Hidden Markov Models with Random Effects for Waveform Modeling (2008)

Seyoung Kim, Padhraic Smyth, Sam Roweis

This paper proposes a general probabilistic framework for shape-based modeling and classification of waveform data. A segmental hidden Markov model (HMM) is used to characterize waveform shape and...

Text Modeling using Unsupervised Topic Models and Concept Hierarchies (2008)

Chemudugunta, Chaitanya, Smyth, Padhraic, Steyvers, Mark

Statistical topic models provide a general data-driven framework for automated discovery of high-level knowledge from large collections of text documents. While topic models can potentially discover...

Analysis and Visualization of Network Data using (2008)

Scott White, Danyel Fisher, Padhraic Smyth, Yan-biao Boey

The JUNG (Java Universal Network/Graph) Framework is a free, open-source software library that provides a common and extendible language for the manipulation, analysis, and visualization of data that...

Abstract (2008)

Seyoung Kim, Padhraic Smyth

Data sets involving multiple groups with shared characteristics frequently arise in practice. In this paper we extend hierarchical Dirichlet processes to model such data. Each group is assumed to be...

i e n t i s t s d i s p o s a l p r (2008)

Padhraic Smyth, Rodney M. Goodman

Across a variety of scientific, engineering, and business applications it has become commonplace to collect and store large volumes of data. For example, NASA has warehouses of data collected from...

Probabilistic Modeling vs. Function (2008)

Padhraic Smyth

• Original slides created in mid-July for ACM – Some new slides have been added • “new ” logo in upper left – A few slides have been updated • “updated ” logo in upper left •...

Graphical Models for Statistical Inference and Data Assimilation (2008)

Andrew W. Robertson, Padhraic Smyth

In data assimilation for a system which evolves in time, one combines past and current observations with a model of the dynamics of the system, in order to improve the simulation of the system as...

Chapter 1 DATA MINING AT THE INTERFACE OF COMPUTER SCIENCE AND STATISTICS (2008)

Padhraic Smyth

Abstract This chapter is written for computer scientists, engineers, mathematicians, and scientists who wish to gain a better understanding of the role of statistical thinking in modern data mining....

Downloaded from (2008)

Padhraic Smyth

Modern data mining has evolved largely as a result of efforts by computer scientists to address the needs of ‘data owners ’ in extracting useful information from massive observational data sets....

Abstract (2008)

Seyoung Kim, Padhraic Smyth

Data sets involving multiple groups with shared characteristics frequently arise in practice. In this paper we extend hierarchical Dirichlet processes to model such data. Each group is assumed to be...

Abstract (2008)

Seyoung Kim, Padhraic Smyth

Data sets involving multiple groups with shared characteristics frequently arise in practice. In this paper we extend hierarchical Dirichlet processes to model such data. Each group is assumed to be...

Published in: Journal of Intelligence Community Research and Development (2006). Scalable Parallel Topic Models (2008)

David Newman, Padhraic Smyth, Mark Steyvers

(U) The topic model is a popular probabilistic model for text and document modeling. It can be used for topic indexing, document classification, corpus summarization and information retrieval. In the...

Situational Awareness Technologies for Disaster Response (2008)

Naveen Ashish, Dmitri Kalashnikov, Sharad Mehrotra, Ron Eguchi, Rajesh Hegde, Padhraic Smyth

Responding to natural or man-made disasters, in a timely and effective manner, can reduce deaths and injuries, contain or prevent secondary disasters, and reduce the resulting economic losses and...

Infinite Mixtures of Trees (2008)

Sergey Kirshner, Padhraic Smyth

Finite mixtures of tree-structured distributions have been shown to be efficient and effective in modeling multivariate distributions. Using Dirichlet processes, we extend this approach to allow...

Abstract (2008)

Chaitanya Chemudugunta, Padhraic Smyth, Mark Steyvers

Techniques such as probabilistic topic models and latent-semantic indexing have been shown to be broadly useful at automatically extracting the topical or semantic content of documents, or more...

Learning Time-Intensity Profiles of Human Activity using Non-Parametric Bayesian Models (2008)

Alexander T. Ihler, Padhraic Smyth

Data sets that characterize human activity over time through collections of timestamped events or counts are of increasing interest in application areas as humancomputer interaction, video...

Abstract (2008)

Igor V. Cadez, Padhraic Smyth

Massive transaction data sets are recorded in a routine manner in telecommunications, retail commerce, and Web site management. In this paper we address the problem of inferring predictive individual...

Gene Expression Clustering with Functional Mixture Models (2008)

Darya Chudova, Eric Mjolsness, Christopher Hart, Padhraic Smyth

We propose a functional mixture model for simultaneous clustering and alignment of sets of curves measured on a discrete time grid. The model is specifically tailored to gene expression time course...

A.7 Cluster Analysis Of Western North Pacific Tropical Cyclone Tracks (2008)

Suzana Camargo Andrew, Andrew W. Robertson, Scott J. Gaffney, Padhraic Smyth

Introduction Typhoons have a large socio-economic impact in many countries in Asia. Depending on the trajectory of the typhoon or tropical storm, landfall will occur or not. While it is well known...

Distributed Inference for Latent Dirichlet Allocation (2008)

David Newman, Arthur Asuncion, Padhraic Smyth, Max Welling

1 Introduction Very large data sets, such as collections of images, text, and related data, are becoming increasinglycommon, with examples ranging from digitized collections of books by companies...

Probabilistic analysis of a large-scale urban traffic data set (2008)

Jon Hutchins, Alexander Ihler, Padhraic Smyth

Real-world sensor time series are often significantly noisier and more difficult to work with than the relatively clean data sets that tend to be used as the basis for experiments in many research...

2 (2007)

William R. Shankle, Subramani Mani, Michael J. Pazzani, Padhraic Smyth

cognitive modeling and machine learning methods applied to developmental and degenerative conditions of the human brain. 2 is a postgraduate researcher in the Department of Information and Computer...

Clustering and Mode Classification of Engineering Time Series Data (2007)

Padhraic Smyth, Eamonn Keogh

The problem of efficiently and accurately locating patterns of interest in massive time series data sets is an important and non-trivial problem in a wide variety of applications, including diagnosis...

Both authors are affiliated with Machine Learning Systems Group (2007)

Usama Fayyad, Padhraic Smyth

With hardware advances in scientific instruments and data gathering techniques comes the inevitable flood of data that can render traditional approaches to science data analysis severely inadequate....

Inference in Directed Acyclic Graphs with Applications to Hidden Markov Model Structures (2007)

Padhraic Smyth, David Heckerman, Paul Stolorz

Graphical techniques for modeling the delxmdcncies of random variables have been explored in a variety of different areas including statistics, statistical physics, artificial intelligence (AI),...

2 (2007)

Xianping Ge, Padhraic Smyth

Fig. 1. The ADG model for a K = 6, N = 12, rate 1=2 turbo code. Abstract | This paper analyzes the distribution of loop lengths in graphical models for turbo decoding. The properties of such loops...

Local Context Matching for Page Replacement (2007)

Xianping Ge, Scott Gaffney, Dimitry Pavlov, Padhraic Smyth

In this paper we investigate the application of adaptive sequence prediction techniques the problem of page replacement in computer systems. We demonstrate the ability of a method based on local...

Towards Intelligent Trainable Tools for the Automated Analysis, Cataloging, and Searching of Digital Image Libraries: A Machine Learning Approach (2007)

Usama Fayyad, Padhraic Smyth

1. Introduction In areas as diverse as earth remote sensing, astronomy, and medical imaging, image acquisition technology has undergone tremendous improvements in recent years. The vast amounts of...

Section 10. Learning For An Intelligent Instrument (2007)

Wray Buntine, Padhraic Smyth, Ffl A, Given Temperature

metal concentration from a spectrum) from samples. ffl How do you present the results, and error bars, to the engineers? SP2: 10.5 ESTIMATING THE RESPONSE FUNCTION Left: Sample isolated peaks...

Hidden Markov Models For Classification Of Sequential Patterns (2007)

Wray Buntine, Padhraic Smyth

6.88> p(! t j j\Phi t ). SP2: 9.2 HIDDEN MARKOV MODEL BASICS State 1 State 2 State 3 Observable Hidden Time t-1 t t+1 t+2 x(t-1) x(t) x(t+1) x(t+2) ffl Explicit Assumptions (first order)[4]: -- 1....

The Author-Topic Model for Authors and Documents (2007)

Michal Rosen-Zvi, Thomas Griffiths, Mark Steyvers, Padhraic Smyth

We introduce the author-topic model, a generative model for documents that extends Latent Dirichlet Allocation (LDA; Blei, Ng, & Jordan, 2003) to include authorship information. Each author is...

Clustering Markov States into Equivalence Classes using SVD and Heuristic Search Algorithms (2007)

Xianping Ge, Sridevi Parise, Padhraic Smyth

This paper investigates the problem of finding a K-state first-order Markov chain that approximates an 2/-state first-order Markov chain, where K is typically nmch smaller than M. A variety of greedy...

Clustering Markov States into Equivalence Classes using $VD and Heuristic Search Algorithms (2007)

Xianping Ge, Sridevi Parise, Padhraic Smyth

This paper investigates the problem of finding a K-state first-order Markov chain that approximates an M-state first-order Markov chain, where K is typically much smaller than M. A variety of greedy...

Local Context Matching for Page Replacement (2007)

Xianping Ge, Dimitry Pavlov, Padhraic Smyth

In this paper we investigate the application of adaptive sequence prediction techniques the problem of page replacement in computer systems. We demonstrate the ability of a method based on local...

Model Complexity, Goodness of Fit and Diminishing Returns (2007)

Igor V. Cadez, Padhraic Smyth

We investigate a general characteristic of the trade-o in learning problems between goodness-of-t and model complexity. Speci-cally we characterize a general class of learning problems where the...

Hidden Markov Models for Endpoint Detection in Plasma Etch Processes (2007)

Xianping Ge, Padhraic Smyth

We investigate two statistical detection problems in plasma etch endpoint detection: change-point detection and pattern matching. Our approach is based on a segmental semi-Markov model framework. In...

2 (2007)

William R. Shankle, Subramani Mani, Michael J. Pazzani, Padhraic Smyth

has joint appointments with the Departments of Neurology and Information and

2 (2007)

William Rodman Shankle, Subramani Mani, Michael J. Pazzani, Padhraic Smyth

Abstract. We used Machine Learning (ML) methods to learn the best decision rules to distinguish normal brain aging from the earliest stages of dementia using subsamples of 198 normal and 244...

Statistical Inference and Data Mining DATA MINING AIMS TO DISCOVER SOMETHING NEW FROM THE FACTS RECORDED (2007)

David Madigan, Daryl Pregibon, Padhraic Smyth, Terry Widener

Statistics may have little to offer the search architectures in a data mining search, but a great deal to offer in evaluating hypotheses in the search, in evaluating the results of the search, and in...

1 Beyond Independence: Probabilistic Models for Query Approximation on Binary Transaction Data (2007)

Dmitry Pavlov, Heikki Mannila, Padhraic Smyth

We investigate the problem of generating fast approximate answers to queries for large sparse binary data sets. We focus in particular on probabilistic model-based approaches to this problem and...

Chapter 1 DATA MINING AT THE INTERFACE OF COMPUTER SCIENCE AND STATISTICS (2007)

Padhraic Smyth

Abstract This chapter is written for computer scientists, engineers, mathematicians, and scientists who wish to gain a better understanding of the role of statistical thinking in modern data mining....

DISCRETE RECURRENT NEURAL NETWORKS AS PUSHDOWN AUTOMATA (2007)

Zheng Zeng, Rodney M. Goodman, Padhraic Smyth

in this paper we describe a new discrete rccurrcnt neural network model with discrete external stacks for learning context-free grammars (or pushdown automata). Conventional analog recurrent networks...

Learning Generalization Query Models for Transaction Data (2007)

Dmitry Pavlov, Padhraic Smyth

Interactive querying of massive data sets is an increasingly important application. Existing techniques in the database literature have focused on producing fast approximations to exact data counts,...

Cycle Length Distributions in Graphical Models for Iterative Decoding (2007)

Xianping Ge, David Eppstein, Padhraic Smyth, Talk Igor, V. Cadez, Turbo Code, ...

The probability of no cycle of length k or less at a node is . 7 & $ Because the degree of the nodes is 3, there are 2 label sequences of length k. Label Sequences ! ! ! ! ! ! ! ! 8 & $ At a...

Predictive Profiles for Transaction Data using Finite Mixture (2007)

Models Information And, Igor V. Cadez, Padhraic Smyth, Edward Ip, Heikki Mannila

Massive transaction data sets are routinely recorded in a variety of applications including telecommunications, retail commerce, and Web site management. In this paper we address the problem of...

BAYESIAN STATISTICS 7, pp. 000--000 (2007)

Bernardo Bayarri Berger, J. M. Bernardo, M. J. Bayarri, J. O. Berger, A. P. Dawid, D. Heckerman, ...

this article is a natural model for point processes whose events combine irregular bursts of activity with predictable (e.g. daily and hourly) patterns

Challenges and Opportunities in Modeling of Spatio-Temporal Data Position Paper for the NSF Workshop on Spatio-Temporal Data Models for Biogeophysical Fields (2007)

Padhraic Smyth

www.datalab.uci.edu Many of the current techniques used in data mining are limited by a relatively static and xed-dimensional view of the world. For example, classi cation and regression trees are...

The application of information theory to problems in decision tree design and rule-based expert systems (2007)

Smyth, Padhraic

This thesis examines the problems of designing decision trees and expert systems from an information-theoretic viewpoint. A well-known greedy algorithm using mutual information for tree design is...

Distributed inference for latent dirichlet allocation (2007)

David Newman, Arthur Asuncion, Padhraic Smyth, Max Welling

We investigate the problem of learning a widely-used latent-variable model – the Latent Dirichlet Allocation (LDA) or “topic ” model – using distributed computation, where each of ¢...

Learning to detect events with Markov-modulated Poisson processes (2007)

Alexander Ihler, Jon Hutchins, Padhraic Smyth

Time-series of count data occur in many different contexts, including internet navigation logs, freeway traffic monitoring, and security logs associated with buildings. In this paper we describe a...

US Patent collection (2007)

Padhraic Smyth, Mark Steyvers, Uci Dave Newman, Chemudugunta Uci, Tom Griffiths, Uc Berkeley, ...

Background on text analysis The statistical topic model Examples using real data sets

Probabilistic Clustering of Extratropical Cyclones Using Regression Mixture Models (2006)

Scott Gaffney Andrew, Andrew W. Robertson, Padhraic Smyth, Suzana J. Camargo, Michael Ghil

A probabilistic clustering technique is developed for classification of wintertime extratropical cyclone (ETC) tracks over the North Atlantic. A regression mixture model is used to describe the...

Probabilistic Clustering of Extratropical Cyclones Using Regression Mixture Models (2006)

Bren School Of, Scott J. Gaffney, Andrew W. Robertson, Padhraic Smyth, Suzana J. Camargo, Michael Ghil

A probabilistic clustering technique is developed for classification of wintertime extratropical cyclone (ETC) tracks over the North Atlantic. A regression mixture model is used to describe the...

Subseasonal-to-interdecadal variability of the Australian monsoon over North Queensland, Quarterly (2006)

Andrew W. Robertson, Sergey Kirshner, Padhraic Smyth, Stephen P. Charles, Bryson C. Bates

Daily rainfall occurrence and amount at 11 stations over North Queensland are examined during summer 1958–1998, using a Hidden Markov Model (HMM). Daily rainfall variability is described in terms...

Segmental Hidden Markov Models with Random Effects for Waveform Modeling (2006)

Seyoung Kim, Padhraic Smyth

This paper proposes a general probabilistic framework for shape-based modeling and classification of waveform data. A segmental hidden Markov model (HMM) is used to characterize waveform shape and...

Analyzing entities and topics in news articles using statistical topic models (2006)

David Newman, Chaitanya Chemudugunta, Padhraic Smyth, Mark Steyvers

Abstract. Statistical language models can learn relationships between topics discussed in a document collection and persons, organizations and places mentioned in each document. We present a novel...

A nonparametric Bayesian approach to detecting spatial activation patterns in fMRI data (2006)

Seyoung Kim, Padhraic Smyth, Hal Stern

Abstract. Traditional techniques for statistical fMRI analysis are often based on thresholding of individual voxel values or averaging voxel values over a region of interest. In this paper we present...

Probabilistic Clustering of Extratropical Cyclones Using Regression Mixture Models (2006)

Scott J. Gaffney, Andrew W. Robertson, Padhraic Smyth, Suzana J. Camargo, Michael Ghil

A probabilistic clustering technique is developed for classification of wintertime extratropical cyclone (ETC) tracks over the North Atlantic. We use a regression mixture model to describe the...

Joint probabilistic curve clustering and alignment (2005)

Scott Gaffney, Padhraic Smyth

Clustering and prediction of sets of curves is an important problem in many areas of science and engineering. It is often the case that curves tend to be misaligned from each other in a continuous...

Parametric Response Surface Models for Analysis of Multi-Site fMRI data Mixtures of general linear (2005)

Seyoung Kim, Padhraic Smyth, Hal Stern, Jessica Turner, First Birn

Abstract. Analyses of fMRI brain data are often based on statistical tests applied to each voxel or use summary statistics within a region of interest (such as mean or peak activation). These...

Learning Author Topic Models from Text Corpora (2005)

Michal Rosen-zvi, Thomas Griffiths, Mark Steyvers, Padhraic Smyth

We propose a new unsupervised learning technique for extracting information about authors and topics from large text collections. We model documents as if they were generated by a two-stage...

Graphical Models for Statistical Inference and Data Assimilation (2005)

Alexander T. Ihler, Sergey Kirshner, Michael Ghil, Andrew W. Robertson, Padhraic Smyth

In data assimilation for a system which evolves in time, one combines past and current observations with a model of the dynamics of the system, in order to improve the simulation of the system as...

Joint probabilistic curve clustering and alignment (2005)

Scott Gaffney, Padhraic Smyth

Clustering and prediction of sets of curves is an important problem in many areas of science and engineering. It is often the case that curves tend to be misaligned from each other in a continuous...

Hidden Markov Models and Neural Networks for Fault Detection (2004)

Smyth, Padhraic

Continuous online monitoring of complex dynamic systems is common in applications as diverse as industrial plant operations, telecommunications network, and biomedical health monitoring. For...

Deep Space Network Antenna Monitoring Using Adaptive Time Series Methods and Hidden Markov Models (2004)

Smyth, Padhraic, Mellstrom, Jeff

The Deep Space Network (DSN)(designed and operated by the Jet Propulsion Laboratory for the National Aeronautics and Space Administration (NASA) provides end-to-end telecommunication capabilities...

Probabilistic Anomaly Detection in Dynamic Systems (2004)

Smyth, Padhraic

This paper describes probabilistic methods for novelty detection when using pattern recognition methods for fault monitoring of dynamic systems. The problem of novelty detection is particularly acute...

Synthesis of Optimal Nonlinear Feedback Laws for Dynamic Systems Using Neural Networks (2004)

Lee, Allan Y., Smyth, Padhraic

Open-loop solutions of dynamical optimization problems can be numerically computed usingexisting software packages. The computed time histories of the state and control variables, formultiple sets of...

Markov Monitoring with Unknown States (2004)

Smyth, Padhraic

Pattern recognition methods and hidden Markov models can be effective tools for online health monitoring of communications systems. Previous work has assumed that the states in the system model are...

KDD-93: Progress and Challenges in Knowledge Discovery in Databases (2004)

Piatesky-Shapiro, Gregory, Matheus, Christopher, Smyth, Padhraic, Uthurusamy, Ramasamy

Interest in Knowledge Discovery in Databases (KDD) continues to increase, driven by the rapid growth in the number and size of large databases and the applicaitons-driven demand to make sense of them.

Knowledge Discovery in Large Image Databases: Dealing with Uncertainties in Ground Truth (2004)

Smyth, Padhraic, Burl, Michael C., Fayyad, Usama M., Perona, Pietro

This paper discusses the problem of knowledge discovery in image databases with particular focus on the issues which arise when absolute ground truth is not available.

Hideen Markov Models and Neural Networks for Fault Detection in Dynamic Systems (2004)

Smyth, Padhraic

None given. (From conclusion): Neural networks plus Hidden Markov Models(HMM)can provide excellene detection and false alarm rate performance in fault detection applications. Modified models allow...

Discrete Recurrent Neural Networks as Pushdown Automata (2004)

Zeng, Zheng, Goodman, Rodney M., Smyth, Padhraic

In this paper, we describe a new discrete recurrent model with discrete external stacks for learning context-free grammars (or pushdown automata).

Probabilistic Independence Networks for Hidden Markov Probability Models (2004)

Smyth, Padhraic, Heckerman, Cavid, Jordan, Michael I

In this paper we explore hidden Markov models(HMMs) and related structures within the general framework of probabilistic independence networks (PINs). The paper contains a self-contained review of...

Clustering Using Monte Carlo Cross-Validation (2004)

Smyth, Padhraic

In this paper, a new cross-validated likelihood criterion is investigated for determining cluster structure.

Downscaling of daily rainfall occurrence over Northeast Brazil using a hidden Markov model (2004)

Andrew W. Robertson, Sergey Kirshner, Padhraic Smyth

A hidden Markov model (HMM) is used to describe daily rainfall occurrence at ten gauge stations in the state of Ceará in northeast Brazil during the February–April wet season 1975–2002. The...

Modeling waveform shapes with random effects segmental hidden Markov models (2004)

Seyoung Kim, Padhraic Smyth

In this paper we describe a general probabilistic framework for modeling waveforms such as heartbeats from ECG data. The model is based on segmental hidden Markov models (as used in speech...

Conditional Chow-Liu Tree Structures for Modeling Discrete-Valued Time Series (2004)

Sergey Kirshner, Padhraic Smyth, Andrew W. Robertson

We consider the problem of modeling discrete-valued vector time series data using extensions of Chow-Liu tree models to capture both dependencies across time and dependencies across variables. We...

Probabilistic Author-Topic Models for Information Discovery (2004)

Mark Steyvers, Padhraic Smyth, Michal Rosen-Zvi, Thomas Griffiths

We propose a new unsupervised learning technique for extracting information from large text collections. We model documents as if they were generated by a two-stage stochastic process. Each author is...

Modeling Waveform Shapes with Random Effects Segmental Hidden Markov Models (2004)

Seyoung Kim, Padhraic Smyth, Stefan Luther

In this paper we describe a general probabilistic framework for modeling waveforms such as heartbeats from ECG data. The model is based on segmental hidden Markov models (as used in speech...

Joint Probabilistic Curve Clustering and Alignment (2004)

Scott Gaffney, Padhraic Smyth

Clustering and prediction of sets of curves is an important problem in many areas of science and engineering. It is often the case that curves tend to be misaligned from each other in a continuous...

Conditional Chow-Liu Tree Structures for Modeling Discrete-Valued Vector Time Series (2004)

Sergey Kirshner, Padhraic Smyth, Andrew W. Robertson

We consider the problem of modeling discrete-valued vector time series data using extensions of Chow-Liu tree models to capture both dependencies across time and dependencies across variables....

Hidden Markov models for modeling daily rainfall occurrence over Brazil (2003)

Andrew W. Robertson, Sergey Kirshner, Padhraic Smyth

A hidden Markov model (HMM) is used to describe daily rainfall occurrence at ten gauge stations in the state of Ceará in northeast Brazil during the February–April wet season 1975–2002. The...

Unsupervised Learning with Permuted Data (2003)

Sergey Kirshner, Sridevi Parise, Padhraic Smyth

We consider the problem of unsupervised learning from a matrix of data vectors where in each row the observed values can be randomly permuted in an unknown fashion. Such problems arise naturally in...

Curve Clustering with Random Effects Regression Mixtures (2003)

Scott Gaffney Information, Scott J. Gaffney, Padhraic Smyth

In this paper we address the problem of clustering sets of curve or trajectory data generated by groups of objects or individuals.

Probabilistic Models for Joint Clustering and Time-Warping of Multidimensional Curves (2003)

Darya Chudova, Scott Gaffney, Padhraic Smyth

In this paper we present a family of models and learning algorithms that can simultaneously align and cluster sets of multidimensional curves measured on a discrete time grid. Our approach is based...

Translation-Invariant Mixture Models for Curve Clustering (2003)

Darya Chudova, Scott Gaffney, Eric Mjolsness, Padhraic Smyth

In this paper we present a family of algorithms that can simultaneously align and cluster sets of multidimensional curves defined on a discrete time grid. Our approach assumes that the data are being...

Gene Expression Clustering with Functional Mixture Models (2003)

Darya Chudova, Christopher Hart, Eric Mjolsness, Padhraic Smyth

We propose a functional mixture model for simultaneous clustering and alignment of sets of curves measured on a discrete time grid. The model is specifically tailored to gene expression time course...

Unsupervised Learning with Permuted Data (2003)

Sergey Kirshner, Sridevi Parise, Padhraic Smyth

We consider the problem of unsupervised learning from a matrix of data vectors where in each row the observed values are randomly permuted in an unknown fashion. Such problems arise naturally in...

Modeling the Internet and the Web (2003)

Probabilistic Methods And, Pierre Baldi, Paolo Frasconi, Padhraic Smyth

END Figure 4.12 A toy example of states and transitions in an HMM for extracting various fields from the beginning of research papers. Not shown are various possible self-transition probabilities...

Translation-invariant mixture models for curve clustering (2003)

Darya Chudova, Scott Gaffney, Eric Mjolsness, Padhraic Smyth

In this paper we present a family of algorithms that can simultaneously align and cluster sets of multidimensional curves defined on a discrete time grid. Our approach uses the...

Modeling the Internet and the Web: Probabilistic Methods and Algorithms (2003)

Pierre Baldi, Paolo Frasconi, Padhraic Smyth

Having focused in earlier chapters on the general structure of the Web, in this chapter we will discuss in some detail techniques for analyzing the textual content of individual Web pages. The...

Sequential pattern discovery under a markov assumption (2002)

Darya Chudova, Padhraic Smyth

In this paper we investigate the general problem of discovering recurrent patterns that are embedded in categorical sequences. An important real-world problem of this nature is motif discovery in DNA...

Learning to classify galaxy shapes using the EM algorithm (2002)

Sergey Kirshner, Padhraic Smyth, Igor V. Cadez, Chandrika Kamath

We describe the application of probabilistic model-based learning to the problem of automatically identifying classes of galaxies, based on both morphological and pixel intensity characteristics. The...

Probabilistic modelbased detection of bent-double radio galaxies (2002)

Sergey Kirshner, Igor V. Cadez, Padhraic Smyth, Rika Kamath

We describe an application of probabilistic modeling to the problem of recognizing radio galaxies with a bent-double morphology. The type of galaxies in question contain distinctive signatures of...

Learning to Classify Galaxy Shapes Using the EM Algorithm (2002)

Sergey Kirshner, Padhraic Smyth, Igor V. Cadez, Chandrika Kamath

We describe the application of probabilistic model-based learning to the problem of automatically identifying classes of galaxies, based on both morphological and pixel intensity characteristics. The...

Sequential Pattern Discovery under a Markov Assumption (2002)

Information And Computer, Darya Chudova, Padhraic Smyth

In this paper we investigate the general problem of discovering recurrent patterns that are embedded in categorical sequences. An important real-world problem of this nature is motif discovery in DNA...

Approximate Query Answering by Model Averaging (2002)

Dmitry Pavlov, Padhraic Smyth

In earlier work we have introduced and explored a variety of different probabilistic models for the problem of answering selectivity queries posed to large sparse binary data sets. These models can...

Adaptive Approximate Querying of Large Sparse Binary Data Sets via Probabilistic Model Averaging (2002)

Dmitry Pavlov, Padhraic Smyth

In earlier work we introduced and explored a variety of dierent probabilistic models for the problem of answering selectivity queries posed to large sparse binary data sets. These models can be...

Learning to classify galaxy shapes using the EM algorithm (2002)

Sergey Kirshner, Padhraic Smyth, Igor V. Cadez, Chandrika Kamath

We describe the application of probabilistic model-based learning to the problem of automatically identifying classes of galaxies, based on both morphological and pixel intensity characteristics. The...

Business Applications of Data Mining (2002)

Chidanand Apte Bing, Bing Liu, Padhraic Smyth

y illustrative of the tremendous potential of KDD technology. 1.1 Risk Management and Targeted Marketing Insurance and direct-mail retail are examples of businesses that rely on effective data...

A simple method for generating additive clustering models with limited complexity (2002)

Michael D. Lee, Padhraic Smyth

Abstract. Additive clustering was originally developed within cognitive psychology to enable the development of featural models of human mental representation. The representational flexibility of...

Adaptive Approximate Querying of Large Sparse Binary Data Sets via Probabilistic Model Averaging (2002)

Dmitry Pavlov, Padhraic Smyth

In earlier work we introduced and explored a variety of different probabilistic models for the problem of answering selectivity queries posed to large sparse binary data sets. These models can be...

Hidden Markov Models for Endpoint Detection in Plasma Etch Processes (2001)

Xianping Ge, Padhraic Smyth

We investigate two statistical detection problems in plasma etch endpoint detection: change-point detection and pattern matching. Our approach is based on a segmental semi-Markov model framework. In...

Probabilistic query models for transaction data (2001)

Dmitry Pavlov, Padhraic Smyth

We investigate the application of Bayesian networks, Markov random fields, and mixture models to the problem of query answering for transaction data sets. We formulate two versions of the querying...

Probabilistic modeling of transaction data with applications to profiling, visualization, and prediction (2001)

Igor V. Cadez, Padhraic Smyth, Heikki Mannila

Transaction data is ubiquitous in data mining applications. Examples include market basket data in retail commerce, telephone call records in telecommunications, and Web logs of individual...

Beyond Independence: Probabilistic Models for Query Approximation on Binary Transaction Data (2001)

Dmitry Pavlov, Heikki Mannila, Padhraic Smyth

We investigate the problem of generating fast approximate answers to queries posed to large sparse binary data sets. We focus in particular on probabilistic model-based approaches to this problem and...

Predictive Profiles for Transaction Data using Finite Mixture (2001)

Models Information And, Igor V. Cadez, Padhraic Smyth, Edward Ip, Heikki Mannila

Massive transaction data sets are routinely recorded in a variety of applications including telecommunications, retail commerce, and Web site management. In this paper we address the problem of...

The Distribution of Loop Lengths in Graphical Models for Turbo Decoding (2001)

Xianping Ge, David Eppstein, Padhraic Smyth

This paper analyzes the distribution of loop lengths in graphical models for turbo decoding. The properties of such loops are of significant interest in the context of iterative decoding algorithms...

Probabilistic Query Models for Transaction Data (2001)

Dmitry Pavlov, Padhraic Smyth

We investigate the application of Bayesian networks, Markov random fields, and mixture models to the problem of query answering for transaction data sets. We formulate two versions of the querying...

Segmental Semi-Markov Models for Endpoint Detection in Plasma Etching (2001)

Xianping Ge, Padhraic Smyth

We investigate two statistical-detection problems, change-point detection and pattern matching in plasma etch endpoint detection. Our approach is based on a segmental semi-Markov model framework. In...

Hidden Markov Models for Endpoint Detection in Plasma Etch Processes (2001)

Xianping Ge, Padhraic Smyth

We investigate two statistical detection problems in plasma etch endpoint detection: change-point detection and pattern matching. Our approach is based on a segmental semi-Markov model framework. In...

Processing boolean queries over belief networks (2000)

Rina Dechter, Padhraic Smyth

The paper presents a variable elimination method for computing the probability of a cnf query over a belief network. We present a bucket-elimination algorithm whose complexity is controlled by the...

Segmental semi-Markov models for change-point detection with applications to semiconductor manufacturing (2000)

Xianping Ge, Padhraic Smyth

We formulate the problem of change-point detection in a segmental semi-Markov model framework where a change-point corresponds to state switching. The semi-Markov part of the model allows us to...

The UCI KDD archive of large data sets for data mining research and experimentation. SIGKDD Explorations (2000)

Stephen D. Bay, Dennis Kibler, Michael J. Pazzani, Padhraic Smyth

Advances in data collection and storage have allowed organizations to create massive, complex and heterogeneous databases, which have stymied traditional methods of data analysis. This has led to the...

A general probabilistic framework for clustering individuals and objects (2000)

Igor Cadez, Scott Gaffney, Padhraic Smyth

This paper presents a unifying probabilistic framework for clustering individuals or systems into groups when the available data measurements are not multivariate vectors of fixed dimensionality. For...

Deformable Markov model templates for time-series pattern matching (2000)

Xianping Ge, Padhraic Smyth

This paper addresses the problem of automatically detecting speci c patterns or shapes in time-series data. A novel and exible approach is proposed based on segmental semiMarkov models. Unlike...

Segmental Semi-Markov Models for Endpoint Detection in Plasma Etching (2000)

Xianping Ge, Padhraic Smyth

We investigate two statistical-detection problems, change-point detection and pattern matching in plasma etch endpoint detection. Our approach is based on a segmental semi-Markov model framework. In...

Visualization of Navigation Patterns on a Web Site Using Model Based Clustering (2000)

Igor Cadez, David Heckerman, Christopher Meek, Padhraic Smyth, Steven White

1 We present a new methodology for visualizing navigation patterns on a Web site. In our approach, we rst partition site users into clusters such that only users with similar navigation paths through...

Probabilistic Models for Query Approximation with Large Sparse Binary Data Sets (2000)

Dmitry Pavlov, Heikki Mannila, Padhraic Smyth

Large sparse sets of binary transaction data with millions of records and thousands of attributes occur in various domains: customers purchasing products, users visiting web pages, and documents...

A General Probabilistic Framework for Clustering Individuals and Objects (2000)

Igor V. Cadez, Scott Gaffney, Padhraic Smyth

This paper presents a unifying probabilistic framework for clustering individuals or systems into groups when the available data measurements are not multivariate vectors of xed dimensionality. For...

Segmental Semi-Markov Models for Change-Point Detection with Applications to Semiconductor Manufacturing (2000)

Xianping Ge, Padhraic Smyth

We formulate the problem of change-point detection in a segmental semi-Markov model framework where a change-point corresponds to state switching. The semi-Markov part of the model allows us to...

On Model Selection and Concavity for Finite Mixture Models (Extended Abstract) (2000)

Igor V. Cadez, Padhraic Smyth

) Igor V. Cadez and Padhraic Smyth Dept. of Information and Computer Science University of California Irvine, CA 92697-3425, U.S.A. ficadez,smythg@ics.uci.edu 1 Introduction An important open problem...

Visualization of Navigation Patterns on a Web Site Using Model-Based Clustering (2000)

Igor Cadez, David Heckerman, Christopher Meek, Padhraic Smyth, Steven White

We present a new methodology for visualizing navigation patterns on a Web site. In our approach, we rst partition site users into clusters such that only users with similar navigation paths through...

Towards Scalable Support Vector Machines using Squashing (2000)

Dmitry Pavlov, Darya Chudova, Padhraic Smyth

Support vector machines (SVMs) provide classification models with strong theoretical foundations as well as excellent empirical performance on a variety of applications. One of the major drawbacks of...

Processing Boolean queries over Belief networks (2000)

Rina Dechter, Padhraic Smyth

The paper presents a variable elimination method for computing the probability of a cnf query over a belief network. We present a bucket-elimination algorithm whose complexity is controlled by the...

Cycle Length Distributions in Graphical Models for Iterative Decoding (2000)

Xianping Ge, David Eppstein, Padhraic Smyth

This paper analyzes the distribution of cycle lengths in turbo decoding graphs. It is known that the widely-used iterative decoding algorithm for turbo codes is in fact a special case of a quite...

Model Complexity, Goodness of Fit and Diminishing Returns (2000)

Igor V. Cadez, Padhraic Smyth

We investigate a general characteristic of the trade-o in learning problems between goodness-of-t and model complexity. Speci- cally we characterize a general class of learning problems where the...

Deformable Markov Model Templates for Time-Series Pattern Matching (2000)

Xianping Ge, Padhraic Smyth

This paper addresses the problem of automatically detecting specific patterns or shapes in time-series data. A novel and flexible approach is proposed based on segmental semiMarkov models. Unlike...

Visualization of Navigation Patterns on a Web Site Using Model Based Clustering (2000)

Igor Cadez, David Heckerman, Christopher Meek, Padhraic Smyth, Steven White

We present a new methodology for visualizing navigation patterns on a Web site. In our approach, we first partition site users into clusters such that only users with similar navigation paths through...

Towards Scalable Support Vector Machines using Squashing (2000)

Dmitry Pavlov, Darya Chudova, Padhraic Smyth

Support vector machines (SVMs) provide classification models with strong theoretical foundations as well as excellent empirical classification performance on a variety of applications. One of the...

Probabilistic Models for Query Approximation with Large Sparse Binary Data Sets (2000)

Dmitry Pavlov, Heikki Mannila, Padhraic Smyth

Large sparse sets of binary transaction data with millions of records and thousands of attributes occur in various domains: customers purchasing products, users visiting web pages, and documents...

Data Mining: Data Analysis on a Grand Scale? (2000)

Padhraic Smyth Information, Padhraic Smyth

Modern data mining has evolved largely as a result of efforts by computer scientists to address the needs of "data owners" in extracting useful information from massive observational data...

Deformable Markov Model Templates for Time-Series Pattern Matching (2000)

Xianping Ge, Padhraic Smyth

This paper addresses the problem of automatically detecting speci c patterns or shapes in timeseries data. A novel and exible approach is proposed based on segmental semi-Markov models. Unlike...

Model-Based Clustering and Visualization of Navigation Patterns on a Web Site (2000)

Igor V. Cadez, David Heckerman, Christopher Meek, Padhraic Smyth, Steven White

We present a new methodology for exploring and analyzing navigation patterns on a web site. The patterns that can be analyzed consist of sequences of URL categories traversed by users. In our...

Segmental Semi-Markov Models for Change-Point Detection with (2000)

Applications To Semiconductor, Xianping Ge, Padhraic Smyth

We formulate the problem of change-point detection in a segmental semi-Markov model framework where a change-point corresponds to state switching. The semi-Markov part of the model allows us to...

Deformable Markov Model Templates for Time-Series Pattern Matching (2000)

Xianping Ge, Padhraic Smyth

This paper addresses the problem of automatically detecting specific patterns or shapes in time-series data. A novel and exible approach is proposed based on segmental semi-Markov models. Unlike...

Data mining: data analysis on a grand scale (2000)

Padhraic Smyth

Modern data mining has evolved largely as a result of e orts by computer scientists to address the needs of \data owners " in extracting useful information from massive observational data...

Deformable Markov model templates for time-series pattern matching (2000)

Xianping Ge, Padhraic Smyth

This paper addresses the problem of automatically detecting specific patterns or shapes in timeseries data. A novel and flexible approach is proposed based on segmental semi-Markov models. Unlike...

The Distribution of Cycle Lengths in Graphical Models for Iterative Decoding (1999)

Ge, Xian-ping, Eppstein, David, Smyth, Padhraic

This paper analyzes the distribution of cycle lengths in turbo decoding and low-density parity check (LDPC) graphs. The properties of such cycles are of significant interest in the context of...

The distribution of cycle lengths in graphical models for iterative decoding (1999)

Xianping Ge, Padhraic Smyth

This paper analyzes the distribution of cycle lengths in turbo-decoding graphs. The properties of such cycles are of significant interest since it is well-known that iterative decoding can only be...

Trajectory clustering with mixtures of regression models (1999)

Scott Gaffney, Padhraic Smyth

In this paper we address the problem of clustering trajectories, namely sets of short sequences of data measured as a function of a dependent variable such as time. Examples include storm path...

Maximum likelihood estimation of mixture densities for binned and truncated multivariate data (1999)

Igor V. Cadez, Padhraic Smyth, Geo J. Mclachlan, Christine E. Mclaren, I. V. Cadez

Abstract. Binning and truncation of data are common in data analysis and machine learning. This paper addresses the problem of tting mixture densities to multivariate binned and truncated data. The...

The Distribution of Cycle Lengths in Graphical Models for Iterative Decoding (1999)

Xian-ping Ge, David Eppstein, And Padhraic Smyth, Padhraic Smyth

This paper analyzes the distribution of cycle lengths in turbo decoding and low-density parity check (LDPC) graphs. The properties of such cycles are of significant interest in the context of...

Discovering Chinese Words from Unsegmented Text (1999)

Xianping Ge, Wanda Pratt, A Pratt, Padhraic Smyth

In English written text, words are separated by spaces, but in written Chinese text, there are no such separators between words. (See Figure 1.) Thus, effective information retrieval of Chinese text...

Probabilistic Model-Based Clustering of Multivariate and Sequential Data (1999)

Padhraic Smyth

Probabilistic model-based clustering, based on finite mixtures of multivariate models, is a useful framework for clustering data in a statistical context. This general framework can be directly...

Probabilistic Clustering using Hierarchical Models (1999)

Igor Cadez, Padhraic Smyth

This paper addresses the problem of clustering data when the available data measurements are not multivariate vectors of fixed dimensionality. For example, one might have data from a set of medical...

Hierarchical Models for Screening of Iron Deficiency Anemia (1999)

Igor V. Cadez, Christine E. Mclaren, Padhraic Smyth, Geoffrey J. Mclachlan

We investigate the problem of classifying individuals based on estimated density functions for each individual. The problem is similar to conventional classification in that there is labelled...

Local Context Matching for Page Replacement (1999)

Xianping Ge, Dimitry Pavlov, Padhraic Smyth

In this paper we investigate the application of adaptive sequence prediction techniques the problem of page replacement in computer systems. We demonstrate the ability of a method based on local...

The Distribution of Cycle Lengths in Graphical Models for Turbo Decoding (1999)

Xianping Ge, David Eppstein, Padhraic Smyth

This paper analyzes the distribution of cycle lengths in turbo-decoding graphs. The properties of such cycles are of significant interest since it is well-known that iterative decoding can only be...

Mixed memory markov models: Decomposing complex stochastic processes as mixtures of simpler ones (1999)

Lawrence K. Saul, Michael I. Jordan, Padhraic Smyth

Abstract. We study Markov models whose state spaces arise from the Cartesian product of two or more discrete random variables. We show how to parameterize the transition matrices of these models as a...

Local Context Matching for Page Replacement (1999)

Xianping Ge, Scott Gaffney, Dimitry Pavlov, Padhraic Smyth

In this paper we investigate the application of adaptive sequence prediction techniques the problem of page replacement in computer systems. We demonstrate the ability of a method based on local...

Linearly combining density estimators via stacking (1999)

Padhraic Smyth, David Wolpert

This paper presents experimental results with both real and artificial data on using the technique of stacking to combine unsupervised learning algorithms. Specifically, stacking is used to form a...

An Information Theoretic Approach to Distributed Inference and Learning. (1998)

Smyth, Padhraic

This report describes research work which was funded under grant number AFOSR90-00199 during the period February 1st 1990 to May 31st 1992. Our work has focused on developing information-theoretic...

Turbo Decoding of High Performance Error-Correcting Codes via Belief Propagation (1998)

McEliece, Robert, Smyth, Padhraic

We studied AWGN coding theorems for ensembles of coding systems which are built from fixed convolutional codes interconnected with random interleavers. We call these systems turbo-like codes and they...

Multiple Regimes in Northern Hemisphere Height Fields via Mixture Model Clustering (1998)

Padhraic Smyth, Michael Ghil, Kayo Ide

Mixture model clustering is applied to Northern Hemisphere (NH) 700-mb geopotential height anomalies. A mixture model is a flexible probability density estimation technique, consisting of a linear...

Learning to Recognize Volcanoes on Venus (1998)

Michael Burl Lars, Padhraic Smyth, Larry Crumpler

. Dramatic improvements in sensor and image acquisition technology have created a demand for automated tools that can aid in the analysis of large image databases. We describe the development of...

Rule Discovery From Time Series (1998)

Gautam Das, King-ip Lin, Heikki Mannila, Gopal Renganathan, Padhraic Smyth

We consider the problem of finding rules relating patterns in a time series to other patterns in that series, or patterns in one series to patterns in another series. A simple example is a rule such...

Belief Networks, Hidden Markov Models, and Markov Random Fields: a Unifying View (1998)

Padhraic Smyth

The use of graphs to represent independence structure in multivariate probability models has been pursued in a relatively independent fashion across a wide variety of research disciplines since the...

Learning to Recognize Volcanoes on Venus (1998)

Michael C. Burl, Lars Asker, Padhraic Smyth, Usama Fayyad, Pietro Perona, Larry Crumpler

. Dramatic improvements in sensor and image acquisition technology have created a demand for automated tools that can aid in the analysis of large image databases. We describe the development of...

Modeling of Inhomogeneous Markov Random Fields with Applications to Cloud Screening (1998)

Information And Computer, Igor Cadez, Padhraic Smyth

Cloud screening is the process of classifying pixels in satellite images which contain clouds and is an important step in processing remotely-sensed images. This paper applies inhomogeneous...

Multiple Regimes in Northern Hemisphere Height Fields via Mixture Model Clustering (1998)

Padhraic Smyth, Kayo Ide, Michael Ghil, J. Atmos Sci

A mixture model is a flexible probability density estimation technique, consisting of a linear combination of k component densities. Such a model is applied to estimate clustering in Northern...

Stacked Density Estimation (1998)

Information And, Padhraic Smyth, David Wolpert

In this paper, the technique of stacking, previously only used for supervised learning, is applied to unsupervised learning. Specifically, it is used for non-parametric multivariate density...

Probabilistic independence networks for hidden Markov probability models (1997)

Smyth, Padhraic, Heckerman, David, Jordan, Michael I.

Graphical techniques for modeling the dependencies of random variables have been explored in a variety of different areas, including statistics, statistical physics, artificial intelligence, speech...

Probabilistic independence networks for hidden markov probability models (1997)

Padhraic Smyth, David Heckerman, Michael Jordan

Graphical techniques for modeling the dependencies of random variables have been explored in a variety of di erent areas including statistics, statistical physics, arti cial intelligence, speech...

Detecting very early stages of dementia from normal aging with machine learning methods (1997)

William Rodman Shankle, Subramani Mani, Michael J. Pazzani, Padhraic Smyth

Abstract. We used Machine Learning (ML) methods to learn the best decision rules to distinguish normal brain aging from the earliest stages of dementia using subsamples of 198 normal and 244...

Factorial hidden Markov models (1997)

Zoubin Ghahramani, Michael I. Jordan, Padhraic Smyth

Abstract. Hidden Markov models (HMMs) have proven to be one of the most widely used tools for learning probabilistic models of time series data. In an HMM, information about the past is conveyed...

Differential Diagnosis of Dementia: A Knowledge Discovery and Data Mining (KDD) Approach (1997)

Subramani Mani, William Rodman Shankle, Michael J. Pazzani, Padhraic Smyth, Malcolm B. Dick

this report describes the extension of these techniques to the harder task of differential diagnosis. We show that the domain of neuropsychologic test performance helps diagnose AD, but not VD, and...

Clustering Sequences with Hidden Markov Models (1997)

Padhraic Smyth

This paper discusses a probabilistic model-based approach to clustering sequences, using hidden Markov models (HMMs). The problem can be framed as a generalization of the standard mixture model...

Persistence and Recurrence in Atmospheric Circulation (1997)

Andrew Fraser, Kevin Vixie, Padhraic Smyth, Richard Smith

We report preliminary results of an effort to use variants of the Hidden Markov Models developed by speech researchers to characterize persistence and recurrence of atmospheric circulation patterns...

Adaptive Probabilistic Networks with Hidden Variables (1997)

John Binder, Daphne Koller, Stuart Russell, Keiji Kanazawa, Padhraic Smyth

. Probabilistic networks (also known as Bayesian belief networks) allow a compact description of complex stochastic relationships among several random variables. They are rapidly becoming the tool of...

Adaptive Probabilistic Networks with Hidden Variables (1997)

John Binder, Daphne Koller, Stuart Russell, Keiji Kanazawa, Padhraic Smyth

. Probabilistic networks (also known as Bayesian belief networks) allow a compact description of complex stochastic relationships among several random variables. They are used widely for uncertain...

Factorial Hidden Markov Models (1997)

Zoubin Ghahramani, Michael I. Jordan, Padhraic Smyth

. Hidden Markov models (HMMs) have proven to be one of the most widely used tools for learning probabilistic models of time series data. In an HMM, information about the past is conveyed through a...

Detecting very early stages of dementia from normal aging with machine learning methods (1997)

William Rodman Shankle, Subramani Mani, Michael J. Pazzani, Padhraic Smyth

Abstract. We used Machine Learning (ML) methods to learn the best decision rules to distinguish normal brain aging from the earliest stages of dementia using subsamples of 198 normal and 244...

inductive learning, theory revision and information retrieval using machine learning. (1997)

William R. Shankle, Subramani Mani, Michael J. Pazzani, Padhraic Smyth

cognitive modeling and machine learning methods applied to developmental and degenerative conditions of the human brain. 2 is a postgraduate researcher in the Department of Information and Computer...

Probabilistic Independence Networks for Hidden Markov Probability Models (1996)

Smyth, Padhraic, Heckerman, David, Jordan, Michael

Graphical techniques for modeling the dependencies of randomvariables have been explored in a variety of different areas includingstatistics, statistical physics, artificial intelligence, speech...

Probabilistic Independence Networks for Hidden Markov Probability Models (1996)

Smyth, Padhraic, Heckerman, David, Jordan, Michael

Graphical techniques for modeling the dependencies of randomvariables have been explored in a variety of different areas includingstatistics, statistical physics, artificial intelligence, speech...

From data mining to knowledge discovery in databases (1996)

Usama Fayyad, Gregory Piatetsky-shapiro, Padhraic Smyth

■ Data mining and knowledge discovery in databases have been attracting a significant amount of research, industry, and media attention of late. What is all the excitement about? This article...

The KDD Process for Extracting Useful Knowledge from Volumes of Data (1996)

Gregory Piatetsky-shapiro, Padhraic Smyth, Terry Widener

developing the tools needed to control the flood of data facing organizations that depend on ever-growing databases of business, manufacturing, scientific, and personal information.

Knowledge Discovery and Data Mining: Towards a Unifying Framework (1996)

Usama Fayyad, Usama Fayyad, Gregory Piatetsky-shapiro, Gregory Piatetsky-shapiro, Padhraic Smyth, Padhraic Smyth

This paper presents a first step towards a unifying framework for Knowledge Discovery in Databases. We describe links between data mining, knowledge discovery, and other related fields. We then...

From Data Mining to Knowledge Discovery in Databases (1996)

Usama Fayyad, Gregory Piatetsky-shapiro, Padhraic Smyth

This article provides an overview of this emerging field, clarifying how data mining and knowledge

Probabilistic Independence Networks for Hidden Markov Probability Models (1996)

Padhraic Smyth, David Heckerman, Michael I. Jordan

Graphical techniques for modeling the dependencies of random variables have been explored in a variety of different areas including statistics, statistical physics, artificial intelligence, speech...

Clustering using Monte Carlo Cross-Validation (1996)

Padhraic Smyth

Finding the "right" number of clusters, k, for a data set is a difficult, and often ill-posed, problem. In a probabilistic clustering context, likelihood-ratios, penalized likelihoods, and...

Probabilistic Independence Networks for Hidden Markov Probability Models (1996)

Padhraic Smyth, David Heckerman, Michael I. Jordan

Graphical techniques for modeling the dependencies of random variables have been explored in a variety of different areas including statistics, statistical physics, artificial intelligence, speech...

Statistical Themes and Lessons for Data Mining (1996)

Clark Glymour, David Madigan, Daryl Pregibon, Padhraic Smyth, Usama Fayyad

. Data mining is on the interface of Computer Science and Statistics, utilizing advances in both disciplines to make progress in extracting information from large databases. It is an emerging field...

From data mining to knowledge discovery in databases (1996)

Usama Fayyad, Gregory Piatetsky-shapiro, Padhraic Smyth

■ Data mining and knowledge discovery in databases have been attracting a significant amount of research, industry, and media attention of late. What is all the excitement about? This article...

The process of applying machine learning algorithms (1995)

Carla E. Brodley, Padhraic Smyth

In this paper we present a view of the overall process of application development for realworld classification and regression problems. Specifically, we identify, categorize and discuss the various...

Retrofitting Decision Tree Classifiers Using Kernel Density Estimation (1995)

Padhraic Smyth, Alexander Gray, Er Gray, Usama M. Fayyad

A novel method for combining decision trees and kernel density estimators is proposed. Standard classification trees, or class probability trees, provide piecewise constant estimates of class...

The Process of Applying Machine Learning Algorithms (1995)

Carla Brodley, Padhraic Smyth

In this paper we present a view of the overall process of application development for realworld classification and regression problems. Specifically, we identify, categorize and discuss the various...

Learning finite state machines with self-clustering recurrent networks (1993)

Zeng, Zheng, Goodman, Rodney M., Smyth, Padhraic

Recent work has shown that recurrent neural networks have the ability to learn finite state automata from examples. In particular, networks using second-order units have been successful at this task....

Rule-based neural networks for classification and probability estimation (1992)

Goodman, Rodney M., Higgins, Charles M., Miller, John W., Smyth, Padhraic

In this paper we propose a network architecture that combines a rule-based approach with that of the neural network paradigm. Our primary motivation for this is to ensure that the knowledge embodied...

The application of information theory to problems in decision tree design and rule-based expert systems (1988)

Smyth, Padhraic

This thesis examines the problems of designing decision trees and expert systems from an information-theoretic viewpoint. A well-known greedy algorithm using mutual information for tree design is...

The Review of Economics and Statistics (1983)

Padhraic Smyth

• Introductory comments on data mining

Data-driven evolution of data mining algorithms (0000)

Smyth , Padhraic

Data mining is an application-driven field where research questions tend to be motivated by real-world data sets. In this context, a broad spectrum of formalisms and techniques has been proposed by...

Identification of hair cycle-associated genes from time-course gene expression profile data by using replicate variance

Lin, Kevin K., Chudova, Darya, Hatfield, G. Wesley, Smyth, Padhraic, Andersen, Bogi

The hair-growth cycle is an example of a cyclic process that is well characterized morphologically but understood incompletely at the molecular level. As an initial step in discovering regulators in...

Identification of hair cycle-associated genes from time-course gene expression profile data by using replicate variance

Lin, Kevin K., Chudova, Darya, Hatfield, G. Wesley, Smyth, Padhraic, Andersen, Bogi

The hair-growth cycle is an example of a cyclic process that is well characterized morphologically but understood incompletely at the molecular level. As an initial step in discovering regulators in...

Data-driven evolution of data mining algorithms

Smyth , Padhraic

Data mining is an application-driven field where research questions tend to be motivated by real-world data sets. In this context, a broad spectrum of formalisms and techniques has been proposed by...

Markov Monitoring with Unknown States

Padhraic Smyth Jet, Padhraic Smyth

Pattern recognition methods and hidden Markov models can be effective tools for online health monitoring of communications systems. Previous work has assumed that the states in the system model are...

Markov Monitoring with Unknown States

Padhraic Smyth Jet, Padhraic Smyth

Pattern recognition methods and hidden Markov models can be effective tools for online health monitoring of communications systems. Previous work has assumed that the states in the system model are...

Principles of Data Mining

David J. Hand, Heikki Mannila, Padhraic Smyth

The growing interest in data mining is motivated by a common problem across disciplines: how does one store, access, model, and ultimately describe and understand very large data sets? Historically,...

Circadian Clock Genes Contribute to the Regulation of Hair Follicle Cycling

Lin, Kevin K., Kumar, Vivek, Geyfman, Mikhail, Chudova, Darya, Ihler, Alexander T., Smyth, Padhraic, ...

Hair follicles undergo recurrent cycling of controlled growth (anagen), regression (catagen), and relative quiescence (telogen) with a defined periodicity. Taking a genomics approach to study gene...

Text Modeling using Unsupervised Topic Models and Concept Hierarchies

Chaitanya Chemudugunta, Padhraic Smyth, Mark Steyvers

Statistical topic models provide a general data-driven framework for automated discovery of highlevel knowledge from large collections of text documents. While topic models can potentially discover a...