Graphical Models for Statistical Inference and Data Assimilation (2009)
Andrew W. Robertson, Padhraic Smyth
In data assimilation for a system which evolves in time, one combines past and current observations with a model of the dynamics of the system, in order to improve the simulation of the system as...
MODELING COUNT DATA FROM MULTIPLE SENSORS: A BUILDING OCCUPANCY MODEL (2009)
Jon Hutchins, Er Ihler, Padhraic Smyth
Knowledge of the number of people in a building at a given time is crucial for applications such as emergency response. Sensors can be used to gather noisy measurements which when combined, can be...
Chaitanya Chemudugunta, America Holloway, Padhraic Smyth
Abstract. Human-defined concepts are fundamental building-blocks in constructing knowledge bases such as ontologies. Statistical learning techniques provide an alternative automated approach to...
Statistical topic models provide a general data-driven framework for automated discovery of high-level knowledge from large collections of text documents. While topic models can potentially discover...
DATA RECOVERY EXTENDS TRADITIONAL (2009)
C Fl, Boris Mirkin, John Lafferty, David Madigan, Fionn Murtagh, Padhraic Smyth
Often considered more as an art than a science, the field of clustering has been dominated by learning through examples and by choosing techniques almost through trial-and-error. Even the most...
Asynchronous Distributed Learning of Topic Models (2009)
Arthur Asuncion, Padhraic Smyth, Max Welling
Distributed learning is a problem of fundamental interest in machine learning and cognitive science. In this paper, we present asynchronous distributed learning algorithms for two well-known...
List of Supported Students: (2009)
Padhraic Smyth, Igor Cadez, Dasha Chudova, Xianping Ge
probabilistic learning, sequence and time series analysis, clustering, density estimation,
The KDD Cup is the oldest of the many data mining competitions that are now popular [1]. It is an integral part of the annual ACM
Infinite Mixtures of Trees (2008)
Sergey Kirshner, Padhraic Smyth
Finite mixtures of tree-structured distributions have been shown to be efficient and effective in modeling multivariate distributions. Using Dirichlet processes, we extend this approach to allow...
“f., “1.. Discrete Recurrent Neural Networks for Grammatical Inference (2008)
Zheng Zeng, Rodney M. Goodman, Padhraic Smyth
Recurrent neura] networks have rxxcnt]y becII shown to have the ability to learn regular and context-free gralnmam froni examples. We stlow that wllilc conventional analog recurrent networks try to...
Segmental Hidden Markov Models with Random Effects for Waveform Modeling (2008)
Seyoung Kim, Padhraic Smyth, Sam Roweis
This paper proposes a general probabilistic framework for shape-based modeling and classification of waveform data. A segmental hidden Markov model (HMM) is used to characterize waveform shape and...
Text Modeling using Unsupervised Topic Models and Concept Hierarchies (2008)
Chemudugunta, Chaitanya, Smyth, Padhraic, Steyvers, Mark
Statistical topic models provide a general data-driven framework for automated discovery of high-level knowledge from large collections of text documents. While topic models can potentially discover...
Analysis and Visualization of Network Data using (2008)
Scott White, Danyel Fisher, Padhraic Smyth, Yan-biao Boey
The JUNG (Java Universal Network/Graph) Framework is a free, open-source software library that provides a common and extendible language for the manipulation, analysis, and visualization of data that...
Data sets involving multiple groups with shared characteristics frequently arise in practice. In this paper we extend hierarchical Dirichlet processes to model such data. Each group is assumed to be...
i e n t i s t s d i s p o s a l p r (2008)
Padhraic Smyth, Rodney M. Goodman
Across a variety of scientific, engineering, and business applications it has become commonplace to collect and store large volumes of data. For example, NASA has warehouses of data collected from...
Probabilistic Modeling vs. Function (2008)
• Original slides created in mid-July for ACM – Some new slides have been added • “new ” logo in upper left – A few slides have been updated • “updated ” logo in upper left •...
Graphical Models for Statistical Inference and Data Assimilation (2008)
Andrew W. Robertson, Padhraic Smyth
In data assimilation for a system which evolves in time, one combines past and current observations with a model of the dynamics of the system, in order to improve the simulation of the system as...
Chapter 1 DATA MINING AT THE INTERFACE OF COMPUTER SCIENCE AND STATISTICS (2008)
Abstract This chapter is written for computer scientists, engineers, mathematicians, and scientists who wish to gain a better understanding of the role of statistical thinking in modern data mining....
Modern data mining has evolved largely as a result of efforts by computer scientists to address the needs of ‘data owners ’ in extracting useful information from massive observational data sets....
Data sets involving multiple groups with shared characteristics frequently arise in practice. In this paper we extend hierarchical Dirichlet processes to model such data. Each group is assumed to be...
Data sets involving multiple groups with shared characteristics frequently arise in practice. In this paper we extend hierarchical Dirichlet processes to model such data. Each group is assumed to be...
David Newman, Padhraic Smyth, Mark Steyvers
(U) The topic model is a popular probabilistic model for text and document modeling. It can be used for topic indexing, document classification, corpus summarization and information retrieval. In the...
Situational Awareness Technologies for Disaster Response (2008)
Naveen Ashish, Dmitri Kalashnikov, Sharad Mehrotra, Ron Eguchi, Rajesh Hegde, Padhraic Smyth
Responding to natural or man-made disasters, in a timely and effective manner, can reduce deaths and injuries, contain or prevent secondary disasters, and reduce the resulting economic losses and...
Infinite Mixtures of Trees (2008)
Sergey Kirshner, Padhraic Smyth
Finite mixtures of tree-structured distributions have been shown to be efficient and effective in modeling multivariate distributions. Using Dirichlet processes, we extend this approach to allow...
Chaitanya Chemudugunta, Padhraic Smyth, Mark Steyvers
Techniques such as probabilistic topic models and latent-semantic indexing have been shown to be broadly useful at automatically extracting the topical or semantic content of documents, or more...
Learning Time-Intensity Profiles of Human Activity using Non-Parametric Bayesian Models (2008)
Alexander T. Ihler, Padhraic Smyth
Data sets that characterize human activity over time through collections of timestamped events or counts are of increasing interest in application areas as humancomputer interaction, video...
Massive transaction data sets are recorded in a routine manner in telecommunications, retail commerce, and Web site management. In this paper we address the problem of inferring predictive individual...
Gene Expression Clustering with Functional Mixture Models (2008)
Darya Chudova, Eric Mjolsness, Christopher Hart, Padhraic Smyth
We propose a functional mixture model for simultaneous clustering and alignment of sets of curves measured on a discrete time grid. The model is specifically tailored to gene expression time course...
A.7 Cluster Analysis Of Western North Pacific Tropical Cyclone Tracks (2008)
Suzana Camargo Andrew, Andrew W. Robertson, Scott J. Gaffney, Padhraic Smyth
Introduction Typhoons have a large socio-economic impact in many countries in Asia. Depending on the trajectory of the typhoon or tropical storm, landfall will occur or not. While it is well known...
Distributed Inference for Latent Dirichlet Allocation (2008)
David Newman, Arthur Asuncion, Padhraic Smyth, Max Welling
1 Introduction Very large data sets, such as collections of images, text, and related data, are becoming increasinglycommon, with examples ranging from digitized collections of books by companies...
Probabilistic analysis of a large-scale urban traffic data set (2008)
Jon Hutchins, Alexander Ihler, Padhraic Smyth
Real-world sensor time series are often significantly noisier and more difficult to work with than the relatively clean data sets that tend to be used as the basis for experiments in many research...
William R. Shankle, Subramani Mani, Michael J. Pazzani, Padhraic Smyth
cognitive modeling and machine learning methods applied to developmental and degenerative conditions of the human brain. 2 is a postgraduate researcher in the Department of Information and Computer...
Clustering and Mode Classification of Engineering Time Series Data (2007)
The problem of efficiently and accurately locating patterns of interest in massive time series data sets is an important and non-trivial problem in a wide variety of applications, including diagnosis...
Both authors are affiliated with Machine Learning Systems Group (2007)
With hardware advances in scientific instruments and data gathering techniques comes the inevitable flood of data that can render traditional approaches to science data analysis severely inadequate....
Inference in Directed Acyclic Graphs with Applications to Hidden Markov Model Structures (2007)
Padhraic Smyth, David Heckerman, Paul Stolorz
Graphical techniques for modeling the delxmdcncies of random variables have been explored in a variety of different areas including statistics, statistical physics, artificial intelligence (AI),...
Fig. 1. The ADG model for a K = 6, N = 12, rate 1=2 turbo code. Abstract | This paper analyzes the distribution of loop lengths in graphical models for turbo decoding. The properties of such loops...
Local Context Matching for Page Replacement (2007)
Xianping Ge, Scott Gaffney, Dimitry Pavlov, Padhraic Smyth
In this paper we investigate the application of adaptive sequence prediction techniques the problem of page replacement in computer systems. We demonstrate the ability of a method based on local...
1. Introduction In areas as diverse as earth remote sensing, astronomy, and medical imaging, image acquisition technology has undergone tremendous improvements in recent years. The vast amounts of...
Section 10. Learning For An Intelligent Instrument (2007)
Wray Buntine, Padhraic Smyth, Ffl A, Given Temperature
metal concentration from a spectrum) from samples. ffl How do you present the results, and error bars, to the engineers? SP2: 10.5 ESTIMATING THE RESPONSE FUNCTION Left: Sample isolated peaks...
Hidden Markov Models For Classification Of Sequential Patterns (2007)
6.88> p(! t j j\Phi t ). SP2: 9.2 HIDDEN MARKOV MODEL BASICS State 1 State 2 State 3 Observable Hidden Time t-1 t t+1 t+2 x(t-1) x(t) x(t+1) x(t+2) ffl Explicit Assumptions (first order)[4]: -- 1....
The Author-Topic Model for Authors and Documents (2007)
Michal Rosen-Zvi, Thomas Griffiths, Mark Steyvers, Padhraic Smyth
We introduce the author-topic model, a generative model for documents that extends Latent Dirichlet Allocation (LDA; Blei, Ng, & Jordan, 2003) to include authorship information. Each author is...
Clustering Markov States into Equivalence Classes using SVD and Heuristic Search Algorithms (2007)
Xianping Ge, Sridevi Parise, Padhraic Smyth
This paper investigates the problem of finding a K-state first-order Markov chain that approximates an 2/-state first-order Markov chain, where K is typically nmch smaller than M. A variety of greedy...
Clustering Markov States into Equivalence Classes using $VD and Heuristic Search Algorithms (2007)
Xianping Ge, Sridevi Parise, Padhraic Smyth
This paper investigates the problem of finding a K-state first-order Markov chain that approximates an M-state first-order Markov chain, where K is typically much smaller than M. A variety of greedy...
Local Context Matching for Page Replacement (2007)
Xianping Ge, Dimitry Pavlov, Padhraic Smyth
In this paper we investigate the application of adaptive sequence prediction techniques the problem of page replacement in computer systems. We demonstrate the ability of a method based on local...
Model Complexity, Goodness of Fit and Diminishing Returns (2007)
We investigate a general characteristic of the trade-o in learning problems between goodness-of-t and model complexity. Speci-cally we characterize a general class of learning problems where the...
Hidden Markov Models for Endpoint Detection in Plasma Etch Processes (2007)
We investigate two statistical detection problems in plasma etch endpoint detection: change-point detection and pattern matching. Our approach is based on a segmental semi-Markov model framework. In...
William R. Shankle, Subramani Mani, Michael J. Pazzani, Padhraic Smyth
has joint appointments with the Departments of Neurology and Information and
William Rodman Shankle, Subramani Mani, Michael J. Pazzani, Padhraic Smyth
Abstract. We used Machine Learning (ML) methods to learn the best decision rules to distinguish normal brain aging from the earliest stages of dementia using subsamples of 198 normal and 244...
David Madigan, Daryl Pregibon, Padhraic Smyth, Terry Widener
Statistics may have little to offer the search architectures in a data mining search, but a great deal to offer in evaluating hypotheses in the search, in evaluating the results of the search, and in...
Dmitry Pavlov, Heikki Mannila, Padhraic Smyth
We investigate the problem of generating fast approximate answers to queries for large sparse binary data sets. We focus in particular on probabilistic model-based approaches to this problem and...
Chapter 1 DATA MINING AT THE INTERFACE OF COMPUTER SCIENCE AND STATISTICS (2007)
Abstract This chapter is written for computer scientists, engineers, mathematicians, and scientists who wish to gain a better understanding of the role of statistical thinking in modern data mining....
DISCRETE RECURRENT NEURAL NETWORKS AS PUSHDOWN AUTOMATA (2007)
Zheng Zeng, Rodney M. Goodman, Padhraic Smyth
in this paper we describe a new discrete rccurrcnt neural network model with discrete external stacks for learning context-free grammars (or pushdown automata). Conventional analog recurrent networks...
Learning Generalization Query Models for Transaction Data (2007)
Interactive querying of massive data sets is an increasingly important application. Existing techniques in the database literature have focused on producing fast approximations to exact data counts,...
Cycle Length Distributions in Graphical Models for Iterative Decoding (2007)
Xianping Ge, David Eppstein, Padhraic Smyth, Talk Igor, V. Cadez, Turbo Code, ...
The probability of no cycle of length k or less at a node is . 7 & $ Because the degree of the nodes is 3, there are 2 label sequences of length k. Label Sequences ! ! ! ! ! ! ! ! 8 & $ At a...
Predictive Profiles for Transaction Data using Finite Mixture (2007)
Models Information And, Igor V. Cadez, Padhraic Smyth, Edward Ip, Heikki Mannila
Massive transaction data sets are routinely recorded in a variety of applications including telecommunications, retail commerce, and Web site management. In this paper we address the problem of...
BAYESIAN STATISTICS 7, pp. 000--000 (2007)
Bernardo Bayarri Berger, J. M. Bernardo, M. J. Bayarri, J. O. Berger, A. P. Dawid, D. Heckerman, ...
this article is a natural model for point processes whose events combine irregular bursts of activity with predictable (e.g. daily and hourly) patterns
www.datalab.uci.edu Many of the current techniques used in data mining are limited by a relatively static and xed-dimensional view of the world. For example, classi cation and regression trees are...
This thesis examines the problems of designing decision trees and expert systems from an information-theoretic viewpoint. A well-known greedy algorithm using mutual information for tree design is...
Distributed inference for latent dirichlet allocation (2007)
David Newman, Arthur Asuncion, Padhraic Smyth, Max Welling
We investigate the problem of learning a widely-used latent-variable model – the Latent Dirichlet Allocation (LDA) or “topic ” model – using distributed computation, where each of ¢...
Learning to detect events with Markov-modulated Poisson processes (2007)
Alexander Ihler, Jon Hutchins, Padhraic Smyth
Time-series of count data occur in many different contexts, including internet navigation logs, freeway traffic monitoring, and security logs associated with buildings. In this paper we describe a...
Graphical models for statistical inference and data assimilation (2007)
Alexander T. Ihler, Sergey Kirshner, Michael Ghil, Andrew W. Robertson, Padhraic Smyth
www.elsevier.com/locate/physd
Padhraic Smyth, Mark Steyvers, Uci Dave Newman, Chemudugunta Uci, Tom Griffiths, Uc Berkeley, ...
Background on text analysis The statistical topic model Examples using real data sets
Probabilistic Clustering of Extratropical Cyclones Using Regression Mixture Models (2006)
Scott Gaffney Andrew, Andrew W. Robertson, Padhraic Smyth, Suzana J. Camargo, Michael Ghil
A probabilistic clustering technique is developed for classification of wintertime extratropical cyclone (ETC) tracks over the North Atlantic. A regression mixture model is used to describe the...
Probabilistic Clustering of Extratropical Cyclones Using Regression Mixture Models (2006)
Bren School Of, Scott J. Gaffney, Andrew W. Robertson, Padhraic Smyth, Suzana J. Camargo, Michael Ghil
A probabilistic clustering technique is developed for classification of wintertime extratropical cyclone (ETC) tracks over the North Atlantic. A regression mixture model is used to describe the...
Andrew W. Robertson, Sergey Kirshner, Padhraic Smyth, Stephen P. Charles, Bryson C. Bates
Daily rainfall occurrence and amount at 11 stations over North Queensland are examined during summer 1958–1998, using a Hidden Markov Model (HMM). Daily rainfall variability is described in terms...
Segmental Hidden Markov Models with Random Effects for Waveform Modeling (2006)
This paper proposes a general probabilistic framework for shape-based modeling and classification of waveform data. A segmental hidden Markov model (HMM) is used to characterize waveform shape and...
Analyzing entities and topics in news articles using statistical topic models (2006)
David Newman, Chaitanya Chemudugunta, Padhraic Smyth, Mark Steyvers
Abstract. Statistical language models can learn relationships between topics discussed in a document collection and persons, organizations and places mentioned in each document. We present a novel...
A nonparametric Bayesian approach to detecting spatial activation patterns in fMRI data (2006)
Seyoung Kim, Padhraic Smyth, Hal Stern
Abstract. Traditional techniques for statistical fMRI analysis are often based on thresholding of individual voxel values or averaging voxel values over a region of interest. In this paper we present...
Probabilistic Clustering of Extratropical Cyclones Using Regression Mixture Models (2006)
Scott J. Gaffney, Andrew W. Robertson, Padhraic Smyth, Suzana J. Camargo, Michael Ghil
A probabilistic clustering technique is developed for classification of wintertime extratropical cyclone (ETC) tracks over the North Atlantic. We use a regression mixture model to describe the...
Joint probabilistic curve clustering and alignment (2005)
Clustering and prediction of sets of curves is an important problem in many areas of science and engineering. It is often the case that curves tend to be misaligned from each other in a continuous...
Seyoung Kim, Padhraic Smyth, Hal Stern, Jessica Turner, First Birn
Abstract. Analyses of fMRI brain data are often based on statistical tests applied to each voxel or use summary statistics within a region of interest (such as mean or peak activation). These...
Learning Author Topic Models from Text Corpora (2005)
Michal Rosen-zvi, Thomas Griffiths, Mark Steyvers, Padhraic Smyth
We propose a new unsupervised learning technique for extracting information about authors and topics from large text collections. We model documents as if they were generated by a two-stage...
Graphical Models for Statistical Inference and Data Assimilation (2005)
Alexander T. Ihler, Sergey Kirshner, Michael Ghil, Andrew W. Robertson, Padhraic Smyth
In data assimilation for a system which evolves in time, one combines past and current observations with a model of the dynamics of the system, in order to improve the simulation of the system as...
Joint probabilistic curve clustering and alignment (2005)
Clustering and prediction of sets of curves is an important problem in many areas of science and engineering. It is often the case that curves tend to be misaligned from each other in a continuous...
Hidden Markov Models and Neural Networks for Fault Detection (2004)
Continuous online monitoring of complex dynamic systems is common in applications as diverse as industrial plant operations, telecommunications network, and biomedical health monitoring. For...
Smyth, Padhraic, Mellstrom, Jeff
The Deep Space Network (DSN)(designed and operated by the Jet Propulsion Laboratory for the National Aeronautics and Space Administration (NASA) provides end-to-end telecommunication capabilities...
Probabilistic Anomaly Detection in Dynamic Systems (2004)
This paper describes probabilistic methods for novelty detection when using pattern recognition methods for fault monitoring of dynamic systems. The problem of novelty detection is particularly acute...
Synthesis of Optimal Nonlinear Feedback Laws for Dynamic Systems Using Neural Networks (2004)
Lee, Allan Y., Smyth, Padhraic
Open-loop solutions of dynamical optimization problems can be numerically computed usingexisting software packages. The computed time histories of the state and control variables, formultiple sets of...
Markov Monitoring with Unknown States (2004)
Pattern recognition methods and hidden Markov models can be effective tools for online health monitoring of communications systems. Previous work has assumed that the states in the system model are...
KDD-93: Progress and Challenges in Knowledge Discovery in Databases (2004)
Piatesky-Shapiro, Gregory, Matheus, Christopher, Smyth, Padhraic, Uthurusamy, Ramasamy
Interest in Knowledge Discovery in Databases (KDD) continues to increase, driven by the rapid growth in the number and size of large databases and the applicaitons-driven demand to make sense of them.
Knowledge Discovery in Large Image Databases: Dealing with Uncertainties in Ground Truth (2004)
Smyth, Padhraic, Burl, Michael C., Fayyad, Usama M., Perona, Pietro
This paper discusses the problem of knowledge discovery in image databases with particular focus on the issues which arise when absolute ground truth is not available.
Hideen Markov Models and Neural Networks for Fault Detection in Dynamic Systems (2004)
None given. (From conclusion): Neural networks plus Hidden Markov Models(HMM)can provide excellene detection and false alarm rate performance in fault detection applications. Modified models allow...
Discrete Recurrent Neural Networks as Pushdown Automata (2004)
Zeng, Zheng, Goodman, Rodney M., Smyth, Padhraic
In this paper, we describe a new discrete recurrent model with discrete external stacks for learning context-free grammars (or pushdown automata).
Volcano Detection Without Ground Truth (2004)
Smyth, Padhraic, Fayyad, Usama, Burl, Michael, Perona, Pietro
Probabilistic Independence Networks for Hidden Markov Probability Models (2004)
Smyth, Padhraic, Heckerman, Cavid, Jordan, Michael I
In this paper we explore hidden Markov models(HMMs) and related structures within the general framework of probabilistic independence networks (PINs). The paper contains a self-contained review of...
Clustering Using Monte Carlo Cross-Validation (2004)
In this paper, a new cross-validated likelihood criterion is investigated for determining cluster structure.
Downscaling of daily rainfall occurrence over Northeast Brazil using a hidden Markov model (2004)
Andrew W. Robertson, Sergey Kirshner, Padhraic Smyth
A hidden Markov model (HMM) is used to describe daily rainfall occurrence at ten gauge stations in the state of Ceará in northeast Brazil during the February–April wet season 1975–2002. The...
Modeling waveform shapes with random effects segmental hidden Markov models (2004)
In this paper we describe a general probabilistic framework for modeling waveforms such as heartbeats from ECG data. The model is based on segmental hidden Markov models (as used in speech...
Conditional Chow-Liu Tree Structures for Modeling Discrete-Valued Time Series (2004)
Sergey Kirshner, Padhraic Smyth, Andrew W. Robertson
We consider the problem of modeling discrete-valued vector time series data using extensions of Chow-Liu tree models to capture both dependencies across time and dependencies across variables. We...
Probabilistic Author-Topic Models for Information Discovery (2004)
Mark Steyvers, Padhraic Smyth, Michal Rosen-Zvi, Thomas Griffiths
We propose a new unsupervised learning technique for extracting information from large text collections. We model documents as if they were generated by a two-stage stochastic process. Each author is...
Modeling Waveform Shapes with Random Effects Segmental Hidden Markov Models (2004)
Seyoung Kim, Padhraic Smyth, Stefan Luther
In this paper we describe a general probabilistic framework for modeling waveforms such as heartbeats from ECG data. The model is based on segmental hidden Markov models (as used in speech...
Joint Probabilistic Curve Clustering and Alignment (2004)
Clustering and prediction of sets of curves is an important problem in many areas of science and engineering. It is often the case that curves tend to be misaligned from each other in a continuous...
Conditional Chow-Liu Tree Structures for Modeling Discrete-Valued Vector Time Series (2004)
Sergey Kirshner, Padhraic Smyth, Andrew W. Robertson
We consider the problem of modeling discrete-valued vector time series data using extensions of Chow-Liu tree models to capture both dependencies across time and dependencies across variables....
Hidden Markov models for modeling daily rainfall occurrence over Brazil (2003)
Andrew W. Robertson, Sergey Kirshner, Padhraic Smyth
A hidden Markov model (HMM) is used to describe daily rainfall occurrence at ten gauge stations in the state of Ceará in northeast Brazil during the February–April wet season 1975–2002. The...
Unsupervised Learning with Permuted Data (2003)
Sergey Kirshner, Sridevi Parise, Padhraic Smyth
We consider the problem of unsupervised learning from a matrix of data vectors where in each row the observed values can be randomly permuted in an unknown fashion. Such problems arise naturally in...
Curve Clustering with Random Effects Regression Mixtures (2003)
Scott Gaffney Information, Scott J. Gaffney, Padhraic Smyth
In this paper we address the problem of clustering sets of curve or trajectory data generated by groups of objects or individuals.
Probabilistic Models for Joint Clustering and Time-Warping of Multidimensional Curves (2003)
Darya Chudova, Scott Gaffney, Padhraic Smyth
In this paper we present a family of models and learning algorithms that can simultaneously align and cluster sets of multidimensional curves measured on a discrete time grid. Our approach is based...
Translation-Invariant Mixture Models for Curve Clustering (2003)
Darya Chudova, Scott Gaffney, Eric Mjolsness, Padhraic Smyth
In this paper we present a family of algorithms that can simultaneously align and cluster sets of multidimensional curves defined on a discrete time grid. Our approach assumes that the data are being...
Gene Expression Clustering with Functional Mixture Models (2003)
Darya Chudova, Christopher Hart, Eric Mjolsness, Padhraic Smyth
We propose a functional mixture model for simultaneous clustering and alignment of sets of curves measured on a discrete time grid. The model is specifically tailored to gene expression time course...
Unsupervised Learning with Permuted Data (2003)
Sergey Kirshner, Sridevi Parise, Padhraic Smyth
We consider the problem of unsupervised learning from a matrix of data vectors where in each row the observed values are randomly permuted in an unknown fashion. Such problems arise naturally in...
Modeling the Internet and the Web (2003)
Probabilistic Methods And, Pierre Baldi, Paolo Frasconi, Padhraic Smyth
END Figure 4.12 A toy example of states and transitions in an HMM for extracting various fields from the beginning of research papers. Not shown are various possible self-transition probabilities...
Translation-invariant mixture models for curve clustering (2003)
Darya Chudova, Scott Gaffney, Eric Mjolsness, Padhraic Smyth
In this paper we present a family of algorithms that can simultaneously align and cluster sets of multidimensional curves defined on a discrete time grid. Our approach uses the...
Modeling the Internet and the Web: Probabilistic Methods and Algorithms (2003)
Pierre Baldi, Paolo Frasconi, Padhraic Smyth
Having focused in earlier chapters on the general structure of the Web, in this chapter we will discuss in some detail techniques for analyzing the textual content of individual Web pages. The...
Sequential pattern discovery under a markov assumption (2002)
In this paper we investigate the general problem of discovering recurrent patterns that are embedded in categorical sequences. An important real-world problem of this nature is motif discovery in DNA...
Learning to classify galaxy shapes using the EM algorithm (2002)
Sergey Kirshner, Padhraic Smyth, Igor V. Cadez, Chandrika Kamath
We describe the application of probabilistic model-based learning to the problem of automatically identifying classes of galaxies, based on both morphological and pixel intensity characteristics. The...
Probabilistic modelbased detection of bent-double radio galaxies (2002)
Sergey Kirshner, Igor V. Cadez, Padhraic Smyth, Rika Kamath
We describe an application of probabilistic modeling to the problem of recognizing radio galaxies with a bent-double morphology. The type of galaxies in question contain distinctive signatures of...
Learning to Classify Galaxy Shapes Using the EM Algorithm (2002)
Sergey Kirshner, Padhraic Smyth, Igor V. Cadez, Chandrika Kamath
We describe the application of probabilistic model-based learning to the problem of automatically identifying classes of galaxies, based on both morphological and pixel intensity characteristics. The...
Sequential Pattern Discovery under a Markov Assumption (2002)
Information And Computer, Darya Chudova, Padhraic Smyth
In this paper we investigate the general problem of discovering recurrent patterns that are embedded in categorical sequences. An important real-world problem of this nature is motif discovery in DNA...
Approximate Query Answering by Model Averaging (2002)
In earlier work we have introduced and explored a variety of different probabilistic models for the problem of answering selectivity queries posed to large sparse binary data sets. These models can...
In earlier work we introduced and explored a variety of dierent probabilistic models for the problem of answering selectivity queries posed to large sparse binary data sets. These models can be...
Learning to classify galaxy shapes using the EM algorithm (2002)
Sergey Kirshner, Padhraic Smyth, Igor V. Cadez, Chandrika Kamath
We describe the application of probabilistic model-based learning to the problem of automatically identifying classes of galaxies, based on both morphological and pixel intensity characteristics. The...
Business Applications of Data Mining (2002)
Chidanand Apte Bing, Bing Liu, Padhraic Smyth
y illustrative of the tremendous potential of KDD technology. 1.1 Risk Management and Targeted Marketing Insurance and direct-mail retail are examples of businesses that rely on effective data...
A simple method for generating additive clustering models with limited complexity (2002)
Michael D. Lee, Padhraic Smyth
Abstract. Additive clustering was originally developed within cognitive psychology to enable the development of featural models of human mental representation. The representational flexibility of...
In earlier work we introduced and explored a variety of different probabilistic models for the problem of answering selectivity queries posed to large sparse binary data sets. These models can be...
Principles of Data Mining / D. Hand, H. Mannila, P. Smyth ; pról. de Thomas Dietterich. (2001)
Hand, David, Mannila, Heikki, Smyth, Padhraic
Incluye bibliografía
Hidden Markov Models for Endpoint Detection in Plasma Etch Processes (2001)
We investigate two statistical detection problems in plasma etch endpoint detection: change-point detection and pattern matching. Our approach is based on a segmental semi-Markov model framework. In...
Probabilistic query models for transaction data (2001)
We investigate the application of Bayesian networks, Markov random fields, and mixture models to the problem of query answering for transaction data sets. We formulate two versions of the querying...
Igor V. Cadez, Padhraic Smyth, Heikki Mannila
Transaction data is ubiquitous in data mining applications. Examples include market basket data in retail commerce, telephone call records in telecommunications, and Web logs of individual...
Beyond Independence: Probabilistic Models for Query Approximation on Binary Transaction Data (2001)
Dmitry Pavlov, Heikki Mannila, Padhraic Smyth
We investigate the problem of generating fast approximate answers to queries posed to large sparse binary data sets. We focus in particular on probabilistic model-based approaches to this problem and...
Predictive Profiles for Transaction Data using Finite Mixture (2001)
Models Information And, Igor V. Cadez, Padhraic Smyth, Edward Ip, Heikki Mannila
Massive transaction data sets are routinely recorded in a variety of applications including telecommunications, retail commerce, and Web site management. In this paper we address the problem of...
The Distribution of Loop Lengths in Graphical Models for Turbo Decoding (2001)
Xianping Ge, David Eppstein, Padhraic Smyth
This paper analyzes the distribution of loop lengths in graphical models for turbo decoding. The properties of such loops are of significant interest in the context of iterative decoding algorithms...
Probabilistic Query Models for Transaction Data (2001)
We investigate the application of Bayesian networks, Markov random fields, and mixture models to the problem of query answering for transaction data sets. We formulate two versions of the querying...
Segmental Semi-Markov Models for Endpoint Detection in Plasma Etching (2001)
We investigate two statistical-detection problems, change-point detection and pattern matching in plasma etch endpoint detection. Our approach is based on a segmental semi-Markov model framework. In...
Hidden Markov Models for Endpoint Detection in Plasma Etch Processes (2001)
We investigate two statistical detection problems in plasma etch endpoint detection: change-point detection and pattern matching. Our approach is based on a segmental semi-Markov model framework. In...
Processing boolean queries over belief networks (2000)
The paper presents a variable elimination method for computing the probability of a cnf query over a belief network. We present a bucket-elimination algorithm whose complexity is controlled by the...
Visualization of Navigation Patterns on a Web Site Using Model Based Clustering (2000)
Categories and Subject Descriptors
We formulate the problem of change-point detection in a segmental semi-Markov model framework where a change-point corresponds to state switching. The semi-Markov part of the model allows us to...
Stephen D. Bay, Dennis Kibler, Michael J. Pazzani, Padhraic Smyth
Advances in data collection and storage have allowed organizations to create massive, complex and heterogeneous databases, which have stymied traditional methods of data analysis. This has led to the...
A general probabilistic framework for clustering individuals and objects (2000)
Igor Cadez, Scott Gaffney, Padhraic Smyth
This paper presents a unifying probabilistic framework for clustering individuals or systems into groups when the available data measurements are not multivariate vectors of fixed dimensionality. For...
Deformable Markov model templates for time-series pattern matching (2000)
This paper addresses the problem of automatically detecting speci c patterns or shapes in time-series data. A novel and exible approach is proposed based on segmental semiMarkov models. Unlike...
Segmental Semi-Markov Models for Endpoint Detection in Plasma Etching (2000)
We investigate two statistical-detection problems, change-point detection and pattern matching in plasma etch endpoint detection. Our approach is based on a segmental semi-Markov model framework. In...
Visualization of Navigation Patterns on a Web Site Using Model Based Clustering (2000)
Igor Cadez, David Heckerman, Christopher Meek, Padhraic Smyth, Steven White
1 We present a new methodology for visualizing navigation patterns on a Web site. In our approach, we rst partition site users into clusters such that only users with similar navigation paths through...
Probabilistic Models for Query Approximation with Large Sparse Binary Data Sets (2000)
Dmitry Pavlov, Heikki Mannila, Padhraic Smyth
Large sparse sets of binary transaction data with millions of records and thousands of attributes occur in various domains: customers purchasing products, users visiting web pages, and documents...
A General Probabilistic Framework for Clustering Individuals and Objects (2000)
Igor V. Cadez, Scott Gaffney, Padhraic Smyth
This paper presents a unifying probabilistic framework for clustering individuals or systems into groups when the available data measurements are not multivariate vectors of xed dimensionality. For...
We formulate the problem of change-point detection in a segmental semi-Markov model framework where a change-point corresponds to state switching. The semi-Markov part of the model allows us to...
On Model Selection and Concavity for Finite Mixture Models (Extended Abstract) (2000)
) Igor V. Cadez and Padhraic Smyth Dept. of Information and Computer Science University of California Irvine, CA 92697-3425, U.S.A. ficadez,smythg@ics.uci.edu 1 Introduction An important open problem...
Visualization of Navigation Patterns on a Web Site Using Model-Based Clustering (2000)
Igor Cadez, David Heckerman, Christopher Meek, Padhraic Smyth, Steven White
We present a new methodology for visualizing navigation patterns on a Web site. In our approach, we rst partition site users into clusters such that only users with similar navigation paths through...
Towards Scalable Support Vector Machines using Squashing (2000)
Dmitry Pavlov, Darya Chudova, Padhraic Smyth
Support vector machines (SVMs) provide classification models with strong theoretical foundations as well as excellent empirical performance on a variety of applications. One of the major drawbacks of...
Processing Boolean queries over Belief networks (2000)
The paper presents a variable elimination method for computing the probability of a cnf query over a belief network. We present a bucket-elimination algorithm whose complexity is controlled by the...
Cycle Length Distributions in Graphical Models for Iterative Decoding (2000)
Xianping Ge, David Eppstein, Padhraic Smyth
This paper analyzes the distribution of cycle lengths in turbo decoding graphs. It is known that the widely-used iterative decoding algorithm for turbo codes is in fact a special case of a quite...
Model Complexity, Goodness of Fit and Diminishing Returns (2000)
We investigate a general characteristic of the trade-o in learning problems between goodness-of-t and model complexity. Speci- cally we characterize a general class of learning problems where the...
Deformable Markov Model Templates for Time-Series Pattern Matching (2000)
This paper addresses the problem of automatically detecting specific patterns or shapes in time-series data. A novel and flexible approach is proposed based on segmental semiMarkov models. Unlike...
Visualization of Navigation Patterns on a Web Site Using Model Based Clustering (2000)
Igor Cadez, David Heckerman, Christopher Meek, Padhraic Smyth, Steven White
We present a new methodology for visualizing navigation patterns on a Web site. In our approach, we first partition site users into clusters such that only users with similar navigation paths through...
Towards Scalable Support Vector Machines using Squashing (2000)
Dmitry Pavlov, Darya Chudova, Padhraic Smyth
Support vector machines (SVMs) provide classification models with strong theoretical foundations as well as excellent empirical classification performance on a variety of applications. One of the...
Probabilistic Models for Query Approximation with Large Sparse Binary Data Sets (2000)
Dmitry Pavlov, Heikki Mannila, Padhraic Smyth
Large sparse sets of binary transaction data with millions of records and thousands of attributes occur in various domains: customers purchasing products, users visiting web pages, and documents...
Data Mining: Data Analysis on a Grand Scale? (2000)
Padhraic Smyth Information, Padhraic Smyth
Modern data mining has evolved largely as a result of efforts by computer scientists to address the needs of "data owners" in extracting useful information from massive observational data...
Deformable Markov Model Templates for Time-Series Pattern Matching (2000)
This paper addresses the problem of automatically detecting speci c patterns or shapes in timeseries data. A novel and exible approach is proposed based on segmental semi-Markov models. Unlike...
Model-Based Clustering and Visualization of Navigation Patterns on a Web Site (2000)
Igor V. Cadez, David Heckerman, Christopher Meek, Padhraic Smyth, Steven White
We present a new methodology for exploring and analyzing navigation patterns on a web site. The patterns that can be analyzed consist of sequences of URL categories traversed by users. In our...
Segmental Semi-Markov Models for Change-Point Detection with (2000)
Applications To Semiconductor, Xianping Ge, Padhraic Smyth
We formulate the problem of change-point detection in a segmental semi-Markov model framework where a change-point corresponds to state switching. The semi-Markov part of the model allows us to...
Deformable Markov Model Templates for Time-Series Pattern Matching (2000)
This paper addresses the problem of automatically detecting specific patterns or shapes in time-series data. A novel and exible approach is proposed based on segmental semi-Markov models. Unlike...
Data mining: data analysis on a grand scale (2000)
Modern data mining has evolved largely as a result of e orts by computer scientists to address the needs of \data owners " in extracting useful information from massive observational data...
Deformable Markov model templates for time-series pattern matching (2000)
This paper addresses the problem of automatically detecting specific patterns or shapes in timeseries data. A novel and flexible approach is proposed based on segmental semi-Markov models. Unlike...
The Distribution of Cycle Lengths in Graphical Models for Iterative Decoding (1999)
Ge, Xian-ping, Eppstein, David, Smyth, Padhraic
This paper analyzes the distribution of cycle lengths in turbo decoding and low-density parity check (LDPC) graphs. The properties of such cycles are of significant interest in the context of...
The distribution of cycle lengths in graphical models for iterative decoding (1999)
This paper analyzes the distribution of cycle lengths in turbo-decoding graphs. The properties of such cycles are of significant interest since it is well-known that iterative decoding can only be...
Trajectory clustering with mixtures of regression models (1999)
In this paper we address the problem of clustering trajectories, namely sets of short sequences of data measured as a function of a dependent variable such as time. Examples include storm path...
Maximum likelihood estimation of mixture densities for binned and truncated multivariate data (1999)
Igor V. Cadez, Padhraic Smyth, Geo J. Mclachlan, Christine E. Mclaren, I. V. Cadez
Abstract. Binning and truncation of data are common in data analysis and machine learning. This paper addresses the problem of tting mixture densities to multivariate binned and truncated data. The...
The Distribution of Cycle Lengths in Graphical Models for Iterative Decoding (1999)
Xian-ping Ge, David Eppstein, And Padhraic Smyth, Padhraic Smyth
This paper analyzes the distribution of cycle lengths in turbo decoding and low-density parity check (LDPC) graphs. The properties of such cycles are of significant interest in the context of...
Discovering Chinese Words from Unsegmented Text (1999)
Xianping Ge, Wanda Pratt, A Pratt, Padhraic Smyth
In English written text, words are separated by spaces, but in written Chinese text, there are no such separators between words. (See Figure 1.) Thus, effective information retrieval of Chinese text...
Probabilistic Model-Based Clustering of Multivariate and Sequential Data (1999)
Probabilistic model-based clustering, based on finite mixtures of multivariate models, is a useful framework for clustering data in a statistical context. This general framework can be directly...
Probabilistic Clustering using Hierarchical Models (1999)
This paper addresses the problem of clustering data when the available data measurements are not multivariate vectors of fixed dimensionality. For example, one might have data from a set of medical...
Hierarchical Models for Screening of Iron Deficiency Anemia (1999)
Igor V. Cadez, Christine E. Mclaren, Padhraic Smyth, Geoffrey J. Mclachlan
We investigate the problem of classifying individuals based on estimated density functions for each individual. The problem is similar to conventional classification in that there is labelled...
Local Context Matching for Page Replacement (1999)
Xianping Ge, Dimitry Pavlov, Padhraic Smyth
In this paper we investigate the application of adaptive sequence prediction techniques the problem of page replacement in computer systems. We demonstrate the ability of a method based on local...
The Distribution of Cycle Lengths in Graphical Models for Turbo Decoding (1999)
Xianping Ge, David Eppstein, Padhraic Smyth
This paper analyzes the distribution of cycle lengths in turbo-decoding graphs. The properties of such cycles are of significant interest since it is well-known that iterative decoding can only be...
Lawrence K. Saul, Michael I. Jordan, Padhraic Smyth
Abstract. We study Markov models whose state spaces arise from the Cartesian product of two or more discrete random variables. We show how to parameterize the transition matrices of these models as a...
Local Context Matching for Page Replacement (1999)
Xianping Ge, Scott Gaffney, Dimitry Pavlov, Padhraic Smyth
In this paper we investigate the application of adaptive sequence prediction techniques the problem of page replacement in computer systems. We demonstrate the ability of a method based on local...
Linearly combining density estimators via stacking (1999)
This paper presents experimental results with both real and artificial data on using the technique of stacking to combine unsupervised learning algorithms. Specifically, stacking is used to form a...
An Information Theoretic Approach to Distributed Inference and Learning. (1998)
This report describes research work which was funded under grant number AFOSR90-00199 during the period February 1st 1990 to May 31st 1992. Our work has focused on developing information-theoretic...
Turbo Decoding of High Performance Error-Correcting Codes via Belief Propagation (1998)
McEliece, Robert, Smyth, Padhraic
We studied AWGN coding theorems for ensembles of coding systems which are built from fixed convolutional codes interconnected with random interleavers. We call these systems turbo-like codes and they...
Multiple Regimes in Northern Hemisphere Height Fields via Mixture Model Clustering (1998)
Padhraic Smyth, Michael Ghil, Kayo Ide
Mixture model clustering is applied to Northern Hemisphere (NH) 700-mb geopotential height anomalies. A mixture model is a flexible probability density estimation technique, consisting of a linear...
Learning to Recognize Volcanoes on Venus (1998)
Michael Burl Lars, Padhraic Smyth, Larry Crumpler
. Dramatic improvements in sensor and image acquisition technology have created a demand for automated tools that can aid in the analysis of large image databases. We describe the development of...
Rule Discovery From Time Series (1998)
Gautam Das, King-ip Lin, Heikki Mannila, Gopal Renganathan, Padhraic Smyth
We consider the problem of finding rules relating patterns in a time series to other patterns in that series, or patterns in one series to patterns in another series. A simple example is a rule such...
Belief Networks, Hidden Markov Models, and Markov Random Fields: a Unifying View (1998)
The use of graphs to represent independence structure in multivariate probability models has been pursued in a relatively independent fashion across a wide variety of research disciplines since the...
Learning to Recognize Volcanoes on Venus (1998)
Michael C. Burl, Lars Asker, Padhraic Smyth, Usama Fayyad, Pietro Perona, Larry Crumpler
. Dramatic improvements in sensor and image acquisition technology have created a demand for automated tools that can aid in the analysis of large image databases. We describe the development of...
Modeling of Inhomogeneous Markov Random Fields with Applications to Cloud Screening (1998)
Information And Computer, Igor Cadez, Padhraic Smyth
Cloud screening is the process of classifying pixels in satellite images which contain clouds and is an important step in processing remotely-sensed images. This paper applies inhomogeneous...
Multiple Regimes in Northern Hemisphere Height Fields via Mixture Model Clustering (1998)
Padhraic Smyth, Kayo Ide, Michael Ghil, J. Atmos Sci
A mixture model is a flexible probability density estimation technique, consisting of a linear combination of k component densities. Such a model is applied to estimate clustering in Northern...
Stacked Density Estimation (1998)
Information And, Padhraic Smyth, David Wolpert
In this paper, the technique of stacking, previously only used for supervised learning, is applied to unsupervised learning. Specifically, it is used for non-parametric multivariate density...
Probabilistic independence networks for hidden Markov probability models (1997)
Smyth, Padhraic, Heckerman, David, Jordan, Michael I.
Graphical techniques for modeling the dependencies of random variables have been explored in a variety of different areas, including statistics, statistical physics, artificial intelligence, speech...
Probabilistic independence networks for hidden markov probability models (1997)
Padhraic Smyth, David Heckerman, Michael Jordan
Graphical techniques for modeling the dependencies of random variables have been explored in a variety of di erent areas including statistics, statistical physics, arti cial intelligence, speech...
Detecting very early stages of dementia from normal aging with machine learning methods (1997)
William Rodman Shankle, Subramani Mani, Michael J. Pazzani, Padhraic Smyth
Abstract. We used Machine Learning (ML) methods to learn the best decision rules to distinguish normal brain aging from the earliest stages of dementia using subsamples of 198 normal and 244...
Factorial hidden Markov models (1997)
Zoubin Ghahramani, Michael I. Jordan, Padhraic Smyth
Abstract. Hidden Markov models (HMMs) have proven to be one of the most widely used tools for learning probabilistic models of time series data. In an HMM, information about the past is conveyed...
Differential Diagnosis of Dementia: A Knowledge Discovery and Data Mining (KDD) Approach (1997)
Subramani Mani, William Rodman Shankle, Michael J. Pazzani, Padhraic Smyth, Malcolm B. Dick
this report describes the extension of these techniques to the harder task of differential diagnosis. We show that the domain of neuropsychologic test performance helps diagnose AD, but not VD, and...
Clustering Sequences with Hidden Markov Models (1997)
This paper discusses a probabilistic model-based approach to clustering sequences, using hidden Markov models (HMMs). The problem can be framed as a generalization of the standard mixture model...
Persistence and Recurrence in Atmospheric Circulation (1997)
Andrew Fraser, Kevin Vixie, Padhraic Smyth, Richard Smith
We report preliminary results of an effort to use variants of the Hidden Markov Models developed by speech researchers to characterize persistence and recurrence of atmospheric circulation patterns...
Adaptive Probabilistic Networks with Hidden Variables (1997)
John Binder, Daphne Koller, Stuart Russell, Keiji Kanazawa, Padhraic Smyth
. Probabilistic networks (also known as Bayesian belief networks) allow a compact description of complex stochastic relationships among several random variables. They are rapidly becoming the tool of...
Adaptive Probabilistic Networks with Hidden Variables (1997)
John Binder, Daphne Koller, Stuart Russell, Keiji Kanazawa, Padhraic Smyth
. Probabilistic networks (also known as Bayesian belief networks) allow a compact description of complex stochastic relationships among several random variables. They are used widely for uncertain...
Factorial Hidden Markov Models (1997)
Zoubin Ghahramani, Michael I. Jordan, Padhraic Smyth
. Hidden Markov models (HMMs) have proven to be one of the most widely used tools for learning probabilistic models of time series data. In an HMM, information about the past is conveyed through a...
Detecting very early stages of dementia from normal aging with machine learning methods (1997)
William Rodman Shankle, Subramani Mani, Michael J. Pazzani, Padhraic Smyth
Abstract. We used Machine Learning (ML) methods to learn the best decision rules to distinguish normal brain aging from the earliest stages of dementia using subsamples of 198 normal and 244...
inductive learning, theory revision and information retrieval using machine learning. (1997)
William R. Shankle, Subramani Mani, Michael J. Pazzani, Padhraic Smyth
cognitive modeling and machine learning methods applied to developmental and degenerative conditions of the human brain. 2 is a postgraduate researcher in the Department of Information and Computer...
Probabilistic Independence Networks for Hidden Markov Probability Models (1996)
Smyth, Padhraic, Heckerman, David, Jordan, Michael
Graphical techniques for modeling the dependencies of randomvariables have been explored in a variety of different areas includingstatistics, statistical physics, artificial intelligence, speech...
Probabilistic Independence Networks for Hidden Markov Probability Models (1996)
Smyth, Padhraic, Heckerman, David, Jordan, Michael
Graphical techniques for modeling the dependencies of randomvariables have been explored in a variety of different areas includingstatistics, statistical physics, artificial intelligence, speech...
From data mining to knowledge discovery in databases (1996)
Usama Fayyad, Gregory Piatetsky-shapiro, Padhraic Smyth
■ Data mining and knowledge discovery in databases have been attracting a significant amount of research, industry, and media attention of late. What is all the excitement about? This article...
The KDD Process for Extracting Useful Knowledge from Volumes of Data (1996)
Gregory Piatetsky-shapiro, Padhraic Smyth, Terry Widener
developing the tools needed to control the flood of data facing organizations that depend on ever-growing databases of business, manufacturing, scientific, and personal information.
Knowledge Discovery and Data Mining: Towards a Unifying Framework (1996)
Usama Fayyad, Usama Fayyad, Gregory Piatetsky-shapiro, Gregory Piatetsky-shapiro, Padhraic Smyth, Padhraic Smyth
This paper presents a first step towards a unifying framework for Knowledge Discovery in Databases. We describe links between data mining, knowledge discovery, and other related fields. We then...
From Data Mining to Knowledge Discovery in Databases (1996)
Usama Fayyad, Gregory Piatetsky-shapiro, Padhraic Smyth
This article provides an overview of this emerging field, clarifying how data mining and knowledge
Probabilistic Independence Networks for Hidden Markov Probability Models (1996)
Padhraic Smyth, David Heckerman, Michael I. Jordan
Graphical techniques for modeling the dependencies of random variables have been explored in a variety of different areas including statistics, statistical physics, artificial intelligence, speech...
Clustering using Monte Carlo Cross-Validation (1996)
Finding the "right" number of clusters, k, for a data set is a difficult, and often ill-posed, problem. In a probabilistic clustering context, likelihood-ratios, penalized likelihoods, and...
Probabilistic Independence Networks for Hidden Markov Probability Models (1996)
Padhraic Smyth, David Heckerman, Michael I. Jordan
Graphical techniques for modeling the dependencies of random variables have been explored in a variety of different areas including statistics, statistical physics, artificial intelligence, speech...
Statistical Themes and Lessons for Data Mining (1996)
Clark Glymour, David Madigan, Daryl Pregibon, Padhraic Smyth, Usama Fayyad
. Data mining is on the interface of Computer Science and Statistics, utilizing advances in both disciplines to make progress in extracting information from large databases. It is an emerging field...
From data mining to knowledge discovery in databases (1996)
Usama Fayyad, Gregory Piatetsky-shapiro, Padhraic Smyth
■ Data mining and knowledge discovery in databases have been attracting a significant amount of research, industry, and media attention of late. What is all the excitement about? This article...
The process of applying machine learning algorithms (1995)
Carla E. Brodley, Padhraic Smyth
In this paper we present a view of the overall process of application development for realworld classification and regression problems. Specifically, we identify, categorize and discuss the various...
Retrofitting Decision Tree Classifiers Using Kernel Density Estimation (1995)
Padhraic Smyth, Alexander Gray, Er Gray, Usama M. Fayyad
A novel method for combining decision trees and kernel density estimators is proposed. Standard classification trees, or class probability trees, provide piecewise constant estimates of class...
The Process of Applying Machine Learning Algorithms (1995)
In this paper we present a view of the overall process of application development for realworld classification and regression problems. Specifically, we identify, categorize and discuss the various...
Learning finite state machines with self-clustering recurrent networks (1993)
Zeng, Zheng, Goodman, Rodney M., Smyth, Padhraic
Recent work has shown that recurrent neural networks have the ability to learn finite state automata from examples. In particular, networks using second-order units have been successful at this task....
Rule-based neural networks for classification and probability estimation (1992)
Goodman, Rodney M., Higgins, Charles M., Miller, John W., Smyth, Padhraic
In this paper we propose a network architecture that combines a rule-based approach with that of the neural network paradigm. Our primary motivation for this is to ensure that the knowledge embodied...
This thesis examines the problems of designing decision trees and expert systems from an information-theoretic viewpoint. A well-known greedy algorithm using mutual information for tree design is...
Data-driven evolution of data mining algorithms (0000)
Data mining is an application-driven field where research questions tend to be motivated by real-world data sets. In this context, a broad spectrum of formalisms and techniques has been proposed by...
Lin, Kevin K., Chudova, Darya, Hatfield, G. Wesley, Smyth, Padhraic, Andersen, Bogi
The hair-growth cycle is an example of a cyclic process that is well characterized morphologically but understood incompletely at the molecular level. As an initial step in discovering regulators in...
Lin, Kevin K., Chudova, Darya, Hatfield, G. Wesley, Smyth, Padhraic, Andersen, Bogi
The hair-growth cycle is an example of a cyclic process that is well characterized morphologically but understood incompletely at the molecular level. As an initial step in discovering regulators in...
Data-driven evolution of data mining algorithms
Data mining is an application-driven field where research questions tend to be motivated by real-world data sets. In this context, a broad spectrum of formalisms and techniques has been proposed by...
Differential Diagnosis of Dementia: A Knowledge Discovery and Data Mining (KDD) Approach
Mani, Subramani, Shankle, William Rodman, Pazzani, Michael J., Smyth, Padhraic, Dick, Malcolm B.
Markov Monitoring with Unknown States
Padhraic Smyth Jet, Padhraic Smyth
Pattern recognition methods and hidden Markov models can be effective tools for online health monitoring of communications systems. Previous work has assumed that the states in the system model are...
Markov Monitoring with Unknown States
Padhraic Smyth Jet, Padhraic Smyth
Pattern recognition methods and hidden Markov models can be effective tools for online health monitoring of communications systems. Previous work has assumed that the states in the system model are...
David J. Hand, Heikki Mannila, Padhraic Smyth
The growing interest in data mining is motivated by a common problem across disciplines: how does one store, access, model, and ultimately describe and understand very large data sets? Historically,...
Circadian Clock Genes Contribute to the Regulation of Hair Follicle Cycling
Lin, Kevin K., Kumar, Vivek, Geyfman, Mikhail, Chudova, Darya, Ihler, Alexander T., Smyth, Padhraic, ...
Hair follicles undergo recurrent cycling of controlled growth (anagen), regression (catagen), and relative quiescence (telogen) with a defined periodicity. Taking a genomics approach to study gene...
Text Modeling using Unsupervised Topic Models and Concept Hierarchies
Chaitanya Chemudugunta, Padhraic Smyth, Mark Steyvers
Statistical topic models provide a general data-driven framework for automated discovery of highlevel knowledge from large collections of text documents. While topic models can potentially discover a...