Robert Tibshirani

A bias correction for the minimum error rate in cross-validation (2009)

Tibshirani, Ryan J., Tibshirani, Robert

Tuning parameters in supervised learning problems are often estimated by cross-validation. The minimum value of the cross-validation error can be biased downward as an estimate of the test error at...

Classification by Set Cover: The Prototype Vector Machine (2009)

Bien, Jacob, Tibshirani, Robert

We introduce a new nearest-prototype classifier, the prototype vector machine (PVM). It arises from a combinatorial optimization problem which we cast as a variant of the set cover problem. We...

Transposable Regularized Covariance Models with an Application to Missing Data Imputation (2009)

Allen, Genevera I., Tibshirani, Robert

Missing data estimation is an important challenge with high-dimensional data arranged in the form of a matrix. Typically this data is transposable, meaning that either the rows, columns or both can...

Abstract (2009)

Trevor Hastie, Robert Tibshirani, Saharon Rosset, Ji Zhu

In this paper we argue that the choice of the SVM cost parameter can be critical. We then derive an algorithm that can fit the entire path of SVM solutions for every value of the cost parameter, with...

Response to Mease and Wyner, Evidence Contrary to the Statistical View (2009)

Jerome Friedman, Trevor Hastie, Robert Tibshirani, Yoav Freund

This is an interesting and thought-provoking paper. We especially appreciate the fact that the authors have supplied R code for their examples, as this allows the reader to understand and assess...

A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis (2009)

Witten, Daniela M., Tibshirani, Robert, Hastie, Trevor

We present a penalized matrix decomposition (PMD), a new framework for computing a rank-K approximation for a matrix. We approximate the matrix X as , where dk, uk, and vk minimize the squared...

© Institute of Mathematical Statistics, 2004 LEAST ANGLE REGRESSION (2008)

Bradley Efron, Trevor Hastie, Iain Johnstone, Robert Tibshirani

The purpose of model selection algorithms such as All Subsets, Forward Selection and Backward Elimination is to choose a linear model on the basis of the same set of data to which the model will be...

Testing significance of features by lassoed principal components (2008)

Witten, Daniela M., Tibshirani, Robert

We consider the problem of testing the significance of features in high-dimensional settings. In particular, we test for differentially-expressed genes in a microarray experiment. We wish to identify...

Boolean implication networks derived from large scale, whole genome microarray datasets (2008)

Sahoo, Debashis, Dill, David L, Gentles, Andrew J, Tibshirani, Robert, Plevritis, Sylvia K

Abstract We describe a method for extracting Boolean implications (if-then relationships) in very large amounts of gene expression microarray data. A meta-analysis of data from thousands of...

© Institute of Mathematical Statistics, 2004 LEAST ANGLE REGRESSION (2008)

Bradley Efron, Trevor Hastie, Iain Johnstone, Robert Tibshirani

The purpose of model selection algorithms such as All Subsets, Forward Selection and Backward Elimination is to choose a linear model on the basis of the same set of data to which the model will be...

Discussion of: Treelets--An adaptive multi-scale basis for sparse unordered data (2008)

Tibshirani, Robert

Discussion of "Treelets--An adaptive multi-scale basis for sparse unordered data" [arXiv:0707.0481]

A study of pre-validation (2008)

Höfling, Holger, Tibshirani, Robert

Given a predictor of outcome derived from a high-dimensional dataset, pre-validation is a useful technique for comparing it to competing predictors on the same dataset. For microarray data, it allows...

Complementary hierarchical clustering (2008)

Robert Tibshirani

When applying hierarchical clustering algorithms to cluster patient samples from microarray data, the clustering patterns generated by most algorithms tend to be dominated by groups of highly...

3 (2008)

Trevor Hastie, Saharon Rosset, Robert Tibshirani, Ji Zhu, Ji Zhu

Abstract. In this paper we argue that the choice of the SVM cost parameter can be critical. We then derive an algorithm that can fit the entire path of SVM solutions for every value of the cost...

Margin trees for highdimensional classification (2008)

Robert Tibshirani, Trevor Hastie, Dale Schurmanns

We propose a method for the classification of more than two classes, from high-dimensional features. Our approach is to build a binary decision tree in a top-down manner, using the optimal margin...

Discussion: The Dantzig selector: Statistical estimation when $p$ is much larger than $n$ (2008)

Efron, Bradley, Hastie, Trevor, Tibshirani, Robert

Discussion of ``The Dantzig selector: Statistical estimation when $p$ is much larger than $n$'' [math/0506081]

Margin trees for highdimensional classification (2008)

Robert Tibshirani, Trevor Hastie

We propose a method for the classification of more than two classes, from high-dimensional features. Our approach is to build a binary decision tree in a top-down manner, using the optimal margin...

Predicting Patient Survival (2008)

Eric Bair, Robert Tibshirani

An important goal of DNA microarray research is to develop tools to diagnose cancer more accurately based on the genetic profile of a tumor. There are several existing techniques in the literature...

Margin trees for high-dimensional classification Robert Tibshirani and Trevor Hastie + (2008)

Febr Ua Ry, Robert Tibshirani, Trevor Hastie

We propose a method for the classification of more than two classes, from high-dimensional features. Our approach is to build a binary decision tree in a top-down manner, using the optimal margin...

Improved detection of dierential gene expression through the singular value decomposition Robert Tibshirani (2008)

Eric Ba Ir, Robert Tibshirani, Eric Bair

We propose a method for detecting dierential gene expression that makes use of the singular value decomposition of the matrix of expression values. It looks for biological variation that correlates...

The "Miss rate" for the analysis of gene (2008)

Expression Data Jonathan, Jonathan Taylor, Robert Tibshirani, Bradley Efron

Multiple testing issues are important in gene expression studies, where typically thousands of genes are compared over two or more experimental conditions. The false discovery rate has become a...

Pre-conditioning” for feature selection and regression in high-dimensional problems (2008)

Debashis Paul, Eric Bair, Trevor Hastie, Robert Tibshirani

We consider regression problems where the number of predictors greatly exceeds the number of observations. We propose a method for

Pre-conditioning” for feature selection and regression in high-dimensional problems (2008)

Debashis Paul, Eric Bair, Trevor Hastie, Robert Tibshirani

We consider regression problems where the number of predictors greatly exceeds the number of observations. We propose a method for

Spatial smoothing and hot spot detection for CGH data using the fused lasso (2008)

Tibshirani, Robert, Wang, Pei

We apply the “fused lasso” regression method of (TSRZ2004) to the problem of “hot- spot detection”, in particular, detection of regions of gain or loss in comparative genomic hybridization...

Sparse inverse covariance estimation with the graphical lasso (2008)

Friedman, Jerome, Hastie, Trevor, Tibshirani, Robert

We consider the problem of estimating sparse graphs by a lasso penalty applied to the inverse covariance matrix. Using a coordinate descent procedure for the lasso, we develop a simple...

Complementary hierarchical clustering (2008)

Nowak, Gen, Tibshirani, Robert

When applying hierarchical clustering algorithms to cluster patient samples from microarray data, the clustering patterns generated by most algorithms tend to be dominated by groups of highly...

On the "degrees of freedom" of the lasso (2007)

Zou, Hui, Hastie, Trevor, Tibshirani, Robert

We study the effective degrees of freedom of the lasso in the framework of Stein's unbiased risk estimation (SURE). We show that the number of nonzero coefficients is an unbiased estimate for the...

classification (2007)

Robert Tibshirani

Bias, variance and prediction error for

Health Research & Policy, and Statistics, (2007)

Robert Tibshirani, Patrick Brown, Incyte Brown Lab

Statistical challenges in the analysis of DNA microarray data

A comparison of statistical learning methods on the GUSTO database (2007)

Marguerite Ennis, Geoffrey Hinton, David Naylor, Mike Revow, Robert Tibshirani

A battery of modern, adaptive nonlinear learning methods is applied to a large real database of cardiac patient data. Each method is used to predict 30 day mortality from a large number of potential...

Cellular Telephones and Automobile Collisions: Some Variations on Matched Case-Control Analysis (2007)

Donald A. Redelmeier, Robert Tibshirani

We describe the analysis of some matched pair binary data arising from a study designed to investigate whether cellular telephones are associated with automobile collisions. Conditional and random...

Cellular Telephones and Motor Vehicle Collisions: Some Variations on Matched Case-Control Analysis (2007)

Ys Is, Donald A. Redelmeier, Robert Tibshirani

We describe the analysis of some matched pair binary data arising from a study designed to investigate whether cellular telephones are associated with motor vehicle collisions. Conditional and random...

Who is the Fastest Man in the World? (2007)

Robert Tibshirani

I compare the world record sprint races of Donovan Bailey and Michael Johnson in the 1996 Olympic games and try to answer the questions a) who is faster?, b) which performance was more remarkable ?...

with (2007)

Trevor Hastie, Robert Tibshirani, Michael B Eisen, Ash Alizadeh, Ronald Levy, Louis Staudt, ...

shaving ’ as a method for identifying distinct sets of genes

Microarrays and Their Use in a Comparative Experiment (2007)

Bradley Elton, Robert Tibshirani, Virginia Goss, Sand Gil Chu

Microarrays enable genetic researchers to measure expression levels for thousands of genes simultaneously. At least that's the idea. In fact the gene expression information arrives in highly...

Abstract (2007)

Trevor Hastie, Robert Tibshirani, Saharon Rosset, Ji Zhu

In this paper we argue that the choice of the SVM cost parameter can be critical. We then derive an algorithm that can fit the entire path of SVM solutions for every value of the cost parameter, with...

Semi-Supervised Methods to Predict Patient (2007)

Survival From Gene, Eric Bair, Robert Tibshirani

this paper concerns di#use large B-cell lymphoma (DLBCL). This is the most common type of lymphoma in adults, and it can only be treated by chemotherapy in approximately 40% of patients (Coi#er 2001;...

Sparse inverse covariance estimation with the lasso (2007)

Friedman, Jerome, Hastie, Trevor, Tibshirani, Robert

We consider the problem of estimating sparse graphs by a lasso penalty applied to the inverse covariance matrix. Using a coordinate descent procedure for the lasso, we develop a simple algorithm that...

Pathwise coordinate optimization (2007)

Friedman, Jerome, Hastie, Trevor, Höfling, Holger, Tibshirani, Robert

We consider ``one-at-a-time'' coordinate-wise descent algorithms for a class of convex optimization problems. An algorithm of this kind has been proposed for the $L_1$-penalized regression (lasso) in...

Forward stagewise regression and the monotone lasso (2007)

Hastie, Trevor, Taylor, Jonathan, Tibshirani, Robert, Walther, Guenther

We consider the least angle regression and forward stagewise algorithms for solving penalized least squares regression problems. In Efron, Hastie, Johnstone & Tibshirani (2004) it is proved that the...

Correction: Discovery and validation of breast cancer subtypes (2007)

Kapp, Amy V, Jeffrey, Stefanie S, Langerød, Anita, Børresen-Dale, Anne-Lise, Han, Wonshik, Noh, Dong-Young, ...

Abstract Following the publication of our recent article (Kapp et al. , BMC Genomics 2006 , 7:231), we (the authors) regrettably found several errors in the published Table 5. This correction article...

Averaged gene expressions for regression (2007)

Park, Mee Young, Hastie, Trevor, Tibshirani, Robert

Although averaging is a simple technique, it plays an important role in reducing variance. We use this essential property of averaging in regression of the DNA microarray data, which poses the...

"Pre-conditioning" for feature selection and regression in high-dimensional problems (2007)

Paul, Debashis, Bair, Eric, Hastie, Trevor, Tibshirani, Robert

We consider regression problems where the number of predictors greatly exceeds the number of observations. We propose a method for variable selection that first estimates the regression function,...

graphical (2007)

Jerome Friedman, Trevor Hastie, Robert Tibshirani

inverse covariance estimation with the

Comments on “Significance of candidate cancer genes as assessed by the CaMP score ” by Parmigiani et al. (2007)

Holger Höfling, Gad Getz, Robert Tibshirani

Identifying genes that are involved in the development of cancer has been a very important research goal. One approach tries to identify genes in tumors that have an increased mutation rate....

Discussion of “the Dantzig selector (2007)

Bradley Efron, Trevor Hastie, Robert Tibshirani

This is a fascinating paper on an important topic: the choice of predictor variables in large-scale linear models. A previous paper in these pages attacked the same problem using the “LARS ”...

Forward stagewise regression and the monotone lasso (2007)

Trevor Hastie, Jonathan Taylor, Robert Tibshirani, Guenther Walther

We consider the least angle regression and forward stagewise algorithms for solving penalized least squares regression problems. In Efron et al. (2004) it is proven that the least angle regression...

Forward stagewise regression and the monotone lasso (2007)

Trevor Hastie, Jonathan Taylor, Robert Tibshirani, Guenther Walther

We consider the least angle regression and forward stagewise algorithms for solving penalized least squares regression problems. In Efron et al. (2004) it is proven that the least angle regression...

Sparse (2007)

Jerome Friedman, Trevor Hastie, Robert Tibshirani

inverse covariance estimation with the lasso

Pathwise coordinate optimization (2007)

Jerome Friedman, Trevor Hastie, Holger Höfling, Robert Tibshirani

We consider “one-at-a-time ” coordinate-wise descent algorithms for a class of convex optimization problems. An algorithm of this kind has been proposed for the L1-penalized regression (lasso) in...

A STUDY OF PRE-VALIDATION (2007)

Holger Höfling, Robert Tibshirani

Pre-validation is a useful technique for the analysis of microarray and other high dimensional data. It allows one to derive a predictor for disease outcome and compare it to standard clinical...

Pathwise coordinate optimization (2007)

Jerome Friedman, Trevor Hastie, Holger Höfling, Robert Tibshirani

We consider “one-at-a-time ” coordinate-wise descent algorithms for a class of convex optimization problems. An algorithm of this kind has been proposed for the L1-penalized regression (lasso) in...

Averaged gene expressions for regression (2007)

Park, Mee Young, Hastie, Trevor, Tibshirani, Robert

Although averaging is a simple technique, it plays an important role in reducing variance. We use this essential property of averaging in regression of the DNA microarray data, which poses the...

Disease-Specific Genomic Analysis: Identifying the Signature of Pathologic Biology (2007)

Nicolau, Monica, Tibshirani, Robert, Børresen-Dale, Anne-Lise, Jeffrey, Stefanie S.

Motivation: Genomic high-throughput technology generates massive data, providing opportunities to understand countless facets of the functioning genome. It also raises profound issues in identifying...

Outlier sums for differential gene expression analysis (2007)

Tibshirani, Robert, Hastie, Trevor

We propose a method for detecting genes that, in a disease group, exhibit unusually high gene expression in some but not all samples. This can be particularly useful in cancer studies, where...

Are clusters found in one dataset present in another dataset? (2007)

Kapp, Amy V., Tibshirani, Robert

In many microarray studies, a cluster defined on one dataset is sought in an independent dataset. If the cluster is found in the new dataset, the cluster is said to be “reproducible” and may be...

Regularized linear discriminant analysis and its application in microarrays (2007)

Guo, Yaqian, Hastie, Trevor, Tibshirani, Robert

In this paper, we introduce a modified version of linear discriminant analysis, called the “shrunken centroids regularized discriminant analysis” (SCRDA). This method generalizes the idea of the...

On testing the significance of sets of genes (2006)

Efron, Bradley, Tibshirani, Robert

This paper discusses the problem of identifying differentially expressed groups of genes from a microarray experiment. The groups of genes are externally defined, for example, sets of gene pathways...

Discovery and validation of breast cancer subtypes (2006)

Kapp, Amy V, Jeffrey, Stefanie S, Langerød, Anita, Børresen-Dale, Anne-Lise, Han, Wonshik, Noh, Dong-Young, ...

Abstract Background Previous studies demonstrated breast cancer tumor tissue samples could be classified into different subtypes based upon DNA microarray profiles. The most recent study presented...

Correlation-sharing for detection of differential gene expression (2006)

Tibshirani, Robert, Wasserman, Larry

We propose a method for detecting differential gene expression that exploits the correlation between genes. Our proposal averages the univariate scores of each feature with the scores in correlation...

A simple method for assessing sample sizes in microarray experiments (2006)

Tibshirani, Robert

Abstract Background In this short article, we discuss a simple method for assessing sample size requirements in microarray experiments. Results Our method starts with the output from a...

Gene Expression Profiling Predicts Survival in Conventional Renal Cell Carcinoma (2006)

Hongjuan Zhao, Börje Ljungberg, Kjell Grankvist, Torgny Rasmuson, Robert Tibshirani, James D. Brooks

Molecular heterogeneity of renal cell carcinomas can be used to distinguish subgroups that correlated with long term survival.

Gene Expression Profiling Predicts Survival in Conventional Renal Cell Carcinoma (2006)

Hongjuan Zhao, Börje Ljungberg, Kjell Grankvist, Torgny Rasmuson, Robert Tibshirani, James D. Brooks

Background Conventional renal cell carcinoma (cRCC) accounts for most of the deaths due to kidney cancer. Tumor stage, grade, and patient performance status are used currently to predict survival...

Prediction by supervised principal components (2006)

Eric Bair, Debashis Paul, Robert Tibshirani

In regression problems where the number of predictors greatly exceeds the number of observations, conventional regression techniques may produce unsatisfactory results. We describe a technique called...

Forward Stagewise Regression and the Monotone Lasso (2006)

Trevor Hastie, Jonathan Taylor, Robert Tibshirani, Guenther Walther

We consider the least angle regression and forward stagewise algorithms for solving penalized least squares regression problems. In Efron et al. (2004) it is proven that the least angle regression...

On testing the significance of sets of genes (2006)

Bradley Efron, Robert Tibshirani

This paper discusses the problem of identifying differentially expressed groups of genes from a microarray experiment. The groups of genes are externally defined, for example, sets of gene pathways...

Prediction by supervised principal components (2006)

Eric Bair, Trevor Hastie, Debashis Paul, Robert Tibshirani

In regression problems where the number of predictors greatly exceeds the number of observations, conventional regression techniques may produce unsatisfactory results. We describe a technique called...

Prediction by supervised principal components (2006)

Eric Bair, Trevor Hastie, Debashis Paul, Robert Tibshirani

In regression problems where the number of predictors greatly exceeds the number of observations, conventional regression techniques may produce unsatisfactory results. We describe a technique called...

A tail strength measure for assessing the overall univariate significance in a dataset (2006)

Taylor, Jonathan, Tibshirani, Robert

We propose an overall measure of significance for a set of hypothesis tests. The ‘tail strength’ is a simple function of the p-values computed for each of the tests. This measure is useful, for...

Hybrid hierarchical clustering with applications to microarray data (2006)

Chipman, Hugh, Tibshirani, Robert

In this paper, we propose a hybrid clustering method that combines the strengths of bottom-up hierarchical clustering with that of top-down clustering. The first method is good at identifying small...

Are clusters found in one dataset present in another dataset? (2006)

Kapp, Amy V., Tibshirani, Robert

In many microarray studies, a cluster defined on one dataset is sought in an independent dataset. If the cluster is found in the new dataset, the cluster is said to be reproducible and may be...

Regularized linear discriminant analysis and its application in microarrays (2006)

Guo, Yaqian, Hastie, Trevor, Tibshirani, Robert

In this paper, we introduce a modified version of linear discriminant analysis, called the "shrunken centroids regularized discriminant analysis" (SCRDA). This method generalizes the idea of the...

Averaged gene expressions for regression (2006)

Park, Mee Young, Hastie, Trevor, Tibshirani, Robert

Although averaging is a simple technique, it plays an important role in reducing variance. We use this essential property of averaging in regression of the DNA microarray data, which poses the...

Outlier sums for differential gene expression analysis (2006)

Tibshirani, Robert, Hastie, Trevor

We propose a method for detecting genes that, in a disease group, exhibit unusually high or gene expression in some but not all samples. This can be particularly useful in cancer studies, where...

Early detection of breast cancer based on gene-expression patterns in peripheral blood cells (2005)

Sharma, Praveen, Sahni, Narinder S, Tibshirani, Robert, Skaane, Per, Urdal, Petter, Berghagen, Hege, ...

Abstract Introduction Existing methods to detect breast cancer in asymptomatic patients have limitations, and there is a need to develop more accurate and convenient methods. In this study, we...

A method for calling gains and losses in array CGH data (2005)

Pei Wang, Young Kim, Jonathan Pollack, Robert Tibshirani

Array CGH is a powerful technique for genomic studies of cancer. It enables one to carry out genomewide screening for regions of genetic alterations, such as chromosome gains and losses, or localized...

univariate significance (2005)

Jonathan Taylor, Robert Tibshirani

tail strength measure for assessing the overall

significance in (2005)

Jonathan Taylor, Robert Tibshirani

tail strength measure for assessing the overall

significance in (2005)

Jonathan Taylor, Robert Tibshirani

tail strength measure for assessing the overall

The 'miss rate' for the analysis of gene expression data (2005)

Taylor, Jonathan, Tibshirani, Robert, Efron, Bradley

Multiple testing issues are important in gene expression studies, where typically thousands of genes are compared over two or more experimental conditions. The false discovery rate has become a...

A method for calling gains and losses in array CGH data (2005)

Wang, Pei, Kim, Young, Pollack, Jonathan, Narasimhan, Balasubramanian, Tibshirani, Robert

Array CGH is a powerful technique for genomic studies of cancer. It enables one to carry out genome-wide screening for regions of genetic alterations, such as chromosome gains and losses, or...

Hybrid Hierarchical Clustering with Applications to Microarray Data (2005)

Chipman, Hugh, Tibshirani, Robert

In this paper we propose a hybrid clustering method that combines the strengths of bottom-up hierarchical clustering with that of top-down clustering. The first method is good at identifying small...

A tail strength measure for assessing the overall univariate significance in a dataset (2005)

Taylor, Jonathan, Tibshirani, Robert

We propose an overall measure of significance for a set of hypothesis tests. The tail strength is a simple function of the p-values computed for each of the tests. This measure is useful, for...

Least Angle Regression (2004)

Efron, Bradley, Hastie, Trevor, Johnstone, Iain, Tibshirani, Robert

The purpose of model selection algorithms such as All Subsets, Forward Selection and Backward Elimination is to choose a linear model on the basis of the same set of data to which the model will be...

Rejoinder to "Least angle regression" by Efron et al (2004)

Efron, Bradley, Hastie, Trevor, Johnstone, Iain, Tibshirani, Robert

Rejoinder to ``Least angle regression'' by Efron et al. [math.ST/0406456]

Least angle regression (2004)

Efron, Bradley, Hastie, Trevor, Johnstone, Iain, Tibshirani, Robert

The purpose of model selection algorithms such as All Subsets, Forward Selection and Backward Elimination is to choose a linear model on the basis of the same set of data to which the model will be...

Semi-Supervised Methods to Predict Patient Survival from Gene Expression Data (2004)

Eric Bair, Robert Tibshirani

Procedures that utilize both gene expression data and clinical data to identify subtypes of cancer can provide more accurate prognoses.

Semi-Supervised Methods to Predict Patient Survival from Gene Expression Data (2004)

Eric Bair, Robert Tibshirani

An important goal of DNA microarray research is to develop tools to diagnose cancer more accurately based on the genetic profile of a tumor. There are several existing techniques in the literature...

Cancer characterization and feature set extraction by discriminative margin clustering (2004)

Munagala, Kamesh, Tibshirani, Robert, Brown, Patrick O

Abstract Background A central challenge in the molecular diagnosis and treatment of cancer is to define a set of molecular features that, taken together, distinguish a given cancer, or type of...

Discussions of boosting papers, and rejoinders (2004)

Bartlett, Peter L., Bickel, Peter J., Bühlmann, Peter, Freund, Yoav, Friedman, Jerome, Hastie, Trevor, ...

Discussions of: "Process consistency for AdaBoost" [Ann. Statist. 32 (2004), no. 1, 13-29] by W. Jiang; "On the Bayes-risk consistency of regularized boosting methods" [ibid., 30-55] by G. Lugosi and...

Sparse principal component analysis (2004)

Hui Zou, Trevor Hastie, Robert Tibshirani

Principal component analysis (PCA) is widely used in data processing and dimensionality reduction. However, PCA suffers from the fact that each principal component is a linear combi-nation of all the...

Efficient quadratic regularization for expression arrays (2004)

Trevor Hastie, Robert Tibshirani

have been many attempts to adapt statistical models for regression and classification to these data, and in many cases these attempts have challenged the computational resources. In this article we...

The Entire Regularization Path for the Support Vector Machine (2004)

Trevor Hastie, Saharon Rosset, Robert Tibshirani, Ji Zhu, Nello Cristianini

The support vector machine (SVM) is a widely used tool for classification. Many efficient implementations exist for fitting a two-class SVM model. The user has to supply values for the tuning...

Sparse Principal Component Analysis (2004)

Hui Zou Trevor, Hui Zou, Trevor Hastie, Robert Tibshirani

Principal component analysis (PCA) is widely used in data processing and dimensionality reduction. However, PCA su#ers from the fact that each principal component is a linear combination of all the...

On the "Degrees of Freedom" of the Lasso (2004)

Hui Zou Trevor, Hui Zou, Trevor Hastie, Robert Tibshirani

We study the degrees of freedom of the Lasso in the framework of Stein's unbiased risk estimation (SURE). We show that the number of non-zero coe#cients is an unbiased estimate for the degrees...

On the ”degrees of freedom” of the lasso (2004)

Hui Zou, Trevor Hastie, Robert Tibshirani

We study the effective degrees of freedom of the lasso in the framework of Stein’s unbiased risk estimation (SURE). We show that the number of nonzero coefficients is an unbiased estimate for the...

Least angle regression (2004)

Bradley Efron, Trevor Hastie, Iain Johnstone, Robert Tibshirani

The purpose of model selection algorithms such as All Subsets, Forward Selection, and Backward Elimination is to choose a linear model on the basis of the same set of data to which the model will be...

Efficient Quadratic Regularization for Expression Arrays (2004)

Trevor Hastie, Robert Tibshirani

this article we expose a class of techniques based on quadratic regularization of linear models, including regularized (ridge) regression, logistic and multinomial regression, linear and mixture...

Sample classification from protein mass spectrometry, by "peak probability contrasts" (2004)

Tibshirani, Robert, Hastie, Trevor, Narasimhan, Balasubramanian, Soltys, Scott, Shi, Gongyi, Koong, Albert, ...

Motivation: Early cancer detection has always been a major research focus in solid tumor oncology. Early tumor detection can theoretically result in lower stage tumors, more treatable diseases and...

Sample classification from protein mass spectrometry, by 'peak probability contrasts' (2004)

Tibshirani, Robert, Hastie, Trevor, Narasimhan, Balasubramanian, Soltys, Scott, Shi, Gongyi, Koong, Albert, ...

Motivation: Early cancer detection has always been a major research focus in solid tumor oncology. Early tumor detection can theoretically result in lower stage tumors, more treatable diseases and...

Sample classification from protein mass spectrometry, by "peak probability contrasts" (2004)

Tibshirani, Robert, Hastie, Trevor, Narasimhan, Balasubramanian, Soltys, Scott, Shi, Gongyi, Koong, Albert, ...

Motivation: Early cancer detection has always been a major research focus in solid tumor oncology. Early tumor detection can theoretically result in lower stage tumors, more treatable diseases and...

Class Prediction by Nearest Shrunken Centroids, with Applications to DNA Microarrays (2003)

Tibshirani, Robert, Hastie, Trevor, Narasimhan, Balasubramanian, Chu, Gilbert

We propose a new method for class prediction in DNA microarray studies based on an enhancement of the nearest prototype classifier. Our technique uses "shrunken" centroids as prototypes for each...

Expression arrays and the n# p problem (2003)

Trevor Hastie, Robert Tibshirani

Gene expression arrays typically have 50 to 100 samples and 5,000 to 20,000 variables (genes). There have been many attempts to adapt statistical models for regression and classification to these...

Statistical significance for genome-wide experiments (2003)

John D. Storey, Robert Tibshirani

Abstract: With the increase in genome-wide experiments and the sequencing of multi-ple genomes, the analysis of large data sets has become commonplace in biology. It is often the case that thousands...

Transcriptional programs activated by exposure of human prostate cancer cells to androgen (2002)

DePrimo, Samuel E, Diehn, Maximilian, Nelson, Joel B, Reiter, Robert E, Matese, John, Fero, Mike, ...

Abstract Background Androgens are required for both normal prostate development and prostate carcinogenesis. We used DNA microarrays, representing approximately 18,000 genes, to examine the temporal...

The Bootstrap Method for Assessing Statistical Accuracy. (2002)

Efron,Bradley, Tibshirani,Robert

This is an invited review of bootstrap methods. It begins with an exposition of the bootstrap estimate of standard error for one-sample situations. Several examples, some involving quite complicated...

Bootstrap Confidence Intervals and Bootstrap Approximations. (2002)

DiCicco,Thomas, Tibshirani,Robert

This document studies the BC sub a bootstrap procedure for constructing parametric and non-parametric confidence intervals. The BC sub a interval relies on the existence of a transformation that maps...

Significance Analysis of Microarrays, Users guide and technical document. http:// www.stanford.edu/ ~wanjen/ sam.pdf (2002)

Gil Chu, Balasubramanian Narasimhan, Robert Tibshirani, Virginia Tusher

Acknowledgments: We would like to thank the R core team for permission to use the R statistical system and Thomas Baier and Erich Neuwirth for permission to use the R DCOM server. Charlie Tibshirani...

Exploratory screening of genes and clusters from microarray experiments’, Statistica Sinica (2002)

Robert Tibshirani, Trevor Hastie, Balasubramanian Narasimhan, Michael Eisen, Gavin Sherlock, Pat Brownil

We discuss a method called "cluster scoring " for supervised learning from a set of gene expression experiments. Cluster scoring generalizes methods that rank individual genes based...

Empirical bayes methods and false discovery rates for microarrays (2002)

Bradley Efron, John D. Storey, Robert Tibshirani

In a classic two-sample problem one might use Wilcoxon's statistic to test for a dierence between Treatment and Control subjects. The analogous microarray experiment yields thousands of Wilcoxon...

Least Angle Regression (2002)

Bradley Efron Trevor, Trevor Hastie, Lain Johnstone, Robert Tibshirani

The purpose of model selection algorithms such as All Subsets, Forward Selection, and Backward Elimination is to choose a linear model on the basis of the same set of data to which the model will be...

Supervised harvesting of expression trees (2001)

Hastie, Trevor, Tibshirani, Robert, Botstein, David, Brown, Patrick

Abstract Background We propose a new method for supervised learning from gene expression data. We call it 'tree harvesting'. This technique starts with a hierarchical clustering of genes, then models...

Cluster validation by prediction strength (2001)

Robert Tibshirani, Guenther Walther, David Botstein, Patrick Brown

We propose a new quantity for assessing the number of groups or clusters in a dataset. The key idea is to view clustering as a supervised classi cation problem, in which we must also estimate the...

Missing value estimation methods for DNA microarrays (2001)

Troyanskaya, Olga, Cantor, Michael, Sherlock, Gavin, Brown, Pat, Hastie, Trevor, Tibshirani, Robert, ...

Motivation: Gene expression microarray experiments can generate data sets with multiple missing expression values. Unfortunately, many algorithms for gene expression analysis require a complete...

'Gene shaving' as a method for identifying distinct sets of genes with similar expression patterns (2000)

Hastie, Trevor, Tibshirani, Robert, Eisen, Michael B, Alizadeh, Ash, Levy, Ronald, Staudt, Louis, ...

Abstract Background Large gene expression studies, such as those conducted using DNA arrays, often provide millions of different pieces of data. To address the problem of analyzing such data, we...

Bayesian backfitting (with comments and a rejoinder by the authors (2000)

Hastie, Trevor, Tibshirani, Robert

We propose general procedures for posterior sampling from additive and generalized additive models. The procedure is a stochastic generalization of the well-known backfitting algorithm for fitting...

Additive logistic regression: a statistical view of boosting (With discussion and a rejoinder by the authors) (2000)

Friedman, Jerome, Hastie, Trevor, Tibshirani, Robert

Boosting is one of the most important recent developments in classification methodology. Boosting works by sequentially applying a classification algorithm to reweighted versions of the training data...

Special invited paper. additive logistic regression: A statistical view of boosting (2000)

Jerome Friedman, Trevor Hastie, Robert Tibshirani

Boosting is one of the most important recent developments in classification methodology. Boosting works by sequentially applying a classification algorithm to reweighted versions of the training data...

Estimating the number of clusters in a dataset via the Gap statistic (2000)

Robert Tibshirani, Guenther Walther, Trevor Hastie

We propose a method (the \Gap statistic") for estimating the number of clusters (groups) in a set of data. The technique uses the output of any clustering algorithm (e.g. k-means or...

Supervised Harvesting of Expression Trees (2000)

Trevor Hastie, Robert Tibshirani, David Botstein, Patrick Brown

Background We propose a new method for supervising learning from gene expression data. We call it \Tree Harvesting". This technique starts with a hierarchical clustering of genes, and models the...

Comment Review Reports Deposited Research Interactions Information Refereed (2000)

Research Gene Shaving', Trevor Hastie, Robert Tibshirani, Michael B Eisen, Ash Alizadeh, Ronald Levy, ...

Background: Large gene expression studies, such as those conducted using DNA arrays, often provide millions of different pieces of data. To address the problem of analyzing such data, we describe a...

Imputing missing data for gene expression arrays (1999)

Trevor Hastie, Robert Tibshirani, Gavin Sherlock, Michael Eisen, Patrick Brown, David Botstein

Here we describe three dierent methods for imputation. The rst is based on a reduced rank SVD of the expression matrix, the second is based on K-nearest neighbor averaging, and the third is based on...

The Global Pairwise Approach to Radiation Hybrid Mapping (1999)

Robert Tibshirani, Laura Lazzeroni, Trevor Hastie, Adam Olshen, David Cox

Introduction We propose a global pairwise method for constructing maps from radiation hybrid data. The method depends upon a novel statistical criterion for identifying the best map than it more...

The Covariance Inflation Criterion for Adaptive Model Selection (1999)

Robert Tibshirani And Keith Knight, Robert Tibshirani, Keith Knight

We propose a new criterion for model selection in prediction problems. The covariance inflation criterion adjusts the training error by the average covariance of the predictions and responses, when...

Imputing missing data for gene expression arrays (1999)

Trevor Hastie, Robert Tibshirani, Gavin Sherlock, Michael Eisen, Patrick Brown, David Botstein

Here we describe three different methods for imputation.The first is based on a reduced rank SVD of the expression matrix, the second is based on K-nearest neighbor averaging, and the third is based...

Generalized Additive Models, Cubic Splines and Penalized Likelihood. (1998)

Hastie,Trevor, Tibshirani,Robert

Generalized additive models extended the class of generalized linear models by allowing an arbitrary smooth function for any or all of the covariates. The functions are established by the local...

A Note on Profile Likelihood, Least Favourable Families and Kullback-Leibler Distance. (1998)

Tibshirani, Robert, Wasserman, Larry

Several methods exist for reducing higher dimensional problems to a single parameter. These include the profile likelihood, least favourable families and methods based on the Kullback-Leibler...

The problem of regions (1998)

Efron, Bradley, Tibshirani, Robert

In the problem of regions, we wish to know which one of a discrete set of possibilities applies to a continuous parameter vector. This problem arises in the following way: we compute a descriptive...

Classification by pairwise coupling (1998)

Hastie, Trevor, Tibshirani, Robert

We discuss a strategy for polychotomous classification that involves coupling the estimating class probabilities for each pair of classes, and estimates together. The coupling model is similar to the...

Additive Logistic Regression: a Statistical View of Boosting (1998)

Jerome Friedman, Trevor Hastie, Robert Tibshirani

Boosting (Freund & Schapire 1996, Schapire & Singer 1998) is one of the most important recent developments in classification methodology. The performance of many classification algorithms can...

Additive Logistic Regression: a Statistical View of Boosting (1998)

Jerome Friedman, Trevor Hastie, Robert Tibshirani

Boosting (Freund & Schapire 1996, Schapire & Singer 1998) is one of the most important recent developments in classification methodology. The performance of many classification algorithms...

The Problem of Regions (1998)

Bradley Efron, Robert Tibshirani

In the problem of regions we wish to know which one of a discrete set of possibilities applies to a continuous parameter vector. This problem arises in the following way: we compute a descriptive...

Bayesian Backfitting (1998)

Trevor Hastie, Robert Tibshirani

We propose general procedures for posterior sampling from additive and generalized additive models, with applications to non-parametric, semi-parametric and mixed models. One chooses a linear...

Classification by Pairwise Coupling (1998)

Trevor Hastie, Robert Tibshirani

We discuss a strategy for polychotomous classification that involves estimating class probabilities for each pair of classes, and then coupling the estimates together. The coupling model is similar...

Trevor Hastie (1998)

And Robert, Trevor Hastie, Robert Tibshirani

We propose general procedures for posterior sampling from additive and generalized additive models, with applications to non-parametric, semi-parametric and mixed models. One chooses a linear...

Additive Logistic Regression: a Statistical View of Boosting (1998)

Jerome Friedman, Trevor Hastie, Robert Tibshirani

Boosting (Freund & Schapire 1995) is one of the most important recent developments in classification methodology. Boosting works by sequentially applying a classification algorithm to reweighted...

The lasso method for variable selection in the cox model (1997)

Robert Tibshirani

I propose a new method for variable selection and shrinkage in Cox’s proportional hazards model. My proposal minimizes the log partial likelihood subject to the sum of the absolute values of the...

Model Search and Inference by Bootstrap "Bumping" (1997)

Robert Tibshirani, Keith Knight

We propose a bootstrap-based method for searching through a space of models. The technique is well suited to complex, adaptively fitted models: it provides a convenient method for finding better...

A proposal for variable selection in the Cox model (1997)

Robert Tibshirani

We propose a new method for variable selection and estimation in Cox's proportional hazards model. Our proposal minimizes the log partial likelihood subject to the sum of the absolute values of...

Using specially designed exponential families for density estimation (1996)

Efron, Bradley, Tibshirani, Robert

We wish to estimate the probability density $g(y)$ that produced an observed random sample of vectors $y_1, y_2, \dots, y_n$. Estimates of $g(y)$ are traditionally constructed in two quite different...

Discriminant Analysis by Gaussian Mixtures (1996)

Trevor Hastie, Robert Tibshirani

Fisher-Rao linear discriminant analysis (LDA) is a valuable tool for multigroup classification. LDA is equivalent to maximum likelihood classification assuming Gaussian distributions for each class....

Computer-Aided Diagnosis of Mammographic Masses (1996)

Trevor Hastie, Debra Ikeda, Robert Tibshirani

We propose a statistical method for finding masses on mammograms. The technique is based on fitting broken line regressions to local intensity plots of the images. The method is illustrated on a...

The Out-of-Bootstrap Method for Model Averaging and Selection (1996)

Sunil Rao, Robert Tibshirani

We propose a bootstrap-based method for model averaging and selection that focuses on training points that are left out of individual bootstrap samples. This information can be used to estimate...

Computer-Aided Diagnosis of Mammographic Masses (1996)

Trevor Hastie, Debra Ikeda, Robert Tibshirani

We propose a statistical method for finding masses on mammograms. The technique is based on fitting broken line regressions to local intensity plots of the images. The method is illustrated on a...

Discriminant Analysis by Gaussian Mixtures (1996)

Trevor Hastie, Robert Tibshirani

Fisher-Rao linear discriminant analysis (LDA) is a valuable tool for multigroup classification. LDA is equivalent to maximum likelihood classification assuming Gaussian distributions for each class....

Two Applications of the Bootstrap (1996)

Robert Tibshirani

Introduction ffl The bootstrap (Efron (1979)) was introduced as a general method for assessing the statistical accuracy of an estimator. See Efron & Tibshirani (1993), Diaconis & Efron...

Bias, Variance and Prediction Error for Classification Rules (1996)

Robert Tibshirani

We study the notions of bias and variance for classification rules. Following Efron (1978) and Breiman (1996) we develop a decomposition of prediction error into its natural components. Then we...

Classification by Pairwise Coupling (1996)

Trevor Hastie, Robert Tibshirani

We discuss a strategy for polychotomous classification that involves estimating class probabilities for each pair of classes, and then coupling the estimates together. The coupling model is similar...

Bias, Variance and Prediction Error for Classification Rules (1996)

Robert Tibshirani

We study the notions of bias and variance for classification rules. Following Efron (1978) we develop a decomposition of prediction error into its natural components. Then we derive bootstrap...

Classification by Pairwise Coupling (1996)

Trevor Hastie, Robert Tibshirani

We discuss a strategy for polychotomous classification that involves estimating class probabilities for each pair of classes, and then coupling the estimates together. The coupling model is similar...

"Coaching" variables for regression and classification (1995)

Robert Tibshirani, Geoffrey Hinton

In a regression or classification setting where we wish to predict Y from x1 ; x2 ; . . . xp , we suppose that an additional set of "coaching" variables z1 ; z2 ; . . . zm are available in...

Penalized Discriminant Analysis (1995)

Trevor Hastie, Andreas Buja, Robert Tibshirani

Fisher's linear discriminant analysis (LDA) is a popular data-analytic tool for studying the relationship between a set of predictors and a categorical response. In this paper we describe a...

Cross-Validation and the Bootstrap: Estimating the Error Rate of a Prediction Rule (1995)

Bradley Efron, Robert Tibshirani

A training set of data has been used to construct a rule for predicting future responses. What is the error rate of this rule? The traditional answer to this question is given by cross-validation....

Flexible Discriminant and Mixture Models (1995)

Trevor Hastie, Robert Tibshirani, Andreas Buja

Introduction In the generic classification or discrimination problem, the outcome of interest G falls into J unordered classes, which for convenience we denote by the set f1; 2; \Delta \Delta \Delta...

Generalized Additive Models (1995)

Trevor Hastie, Robert Tibshirani

This article describes flexible statistical methods that may be used to identify and characterize nonlinear regression effects. These methods are called "generalized additive models". For...

Penalized Discriminant Analysis (1995)

Trevor Hastie, Andreas Buja, Robert Tibshirani

Fisher's linear discriminant analysis (LDA) is a popular data-analytic tool for studying the relationship between a set of predictors and a categorical response. In this paper we describe a...

Flexible Discriminant Analysis by Optimal Scoring (1994)

Trevor Hastie, Robert Tibshirani, Andreas Buja

Fisher's linear discriminant analysis is a valuable tool for multigroup classi cation. With a large number of predictors, one can nd a reduced number of discriminant coordinate functions that...

Discriminant Adaptive Nearest Neighbor Classification (1994)

Trevor Hastie, Robert Tibshirani

Nearest neighbor classification expects the class conditional probabilities to be locally constant, and suffers from bias in high dimensions. We propose a locally adaptive form of nearest neighbor...

Regression Shrinkage and Selection Via the Lasso (1994)

Robert Tibshirani

We propose a new method for estimation in linear models. The "lasso" minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a...

Discriminant Adaptive Nearest Neighbor Classification (1994)

Trevor Hastie, Robert Tibshirani

We propose an adaptive nearest neighbor rule that uses local discriminant information to estimate an effective metric for classification. We also propose a method for global dimension reduction, that...

A Comparison of Some Error Estimates for Neural Network Models (1994)

Robert Tibshirani

this paper we focus on the problem of estimation of the standard error of the predicted values

Handwritten Digit Recognition via Deformable Prototypes (1994)

Trevor Hastie, Robert Tibshirani

We present a new method for classifying handwritten characters, in particular digits. In our approach each character in the alphabet is represented by a prototype, in particular a piecewise-linear...

A Comparison of Some Error Estimates for Neural Network Models (1994)

Robert Tibshirani

this paper we focus on the problem of estimation of the standard error of the predicted values y( `; x i ). A reference for these techniques is Efron and Tibshirani (1993), especially chapter 21. One...

Implications of Measurement Error in Exposure for the Sample Sizes of Case-Control Studies (1994)

McKeown-Eyssen, Gail E., Tibshirani, Robert

In this paper, recent results describing the effects of measurement error on estimation of the association between an exposure and a disease are applied to sample size calculation in case-control...

Flexible Discriminant Analysis by Optimal Scoring (1993)

Trevor Hastie, Robert Tibshirani, Andreas Buja

Fisher's linear discriminant analysis is a valuable tool for multigroup classication. With a large number of predictors, one can nd a reduced number of discriminant coordinate functions that are...

Flexible Discriminant Analysis by Optimal Scoring (1993)

Trevor Hastie, Robert Tibshirani, A. Buja

Fisher's linear discriminant analysis is a valuable tool for multigroup classification. With a large number of predictors, one can find a reduced number of discriminant coordinate functions that...

Combining Estimates in Regression and Classification (1993)

Michael LeBlanc, Robert Tibshirani

We consider the problem of how to combine a collection of general regression fit vectors in order to obtain a better predictive model. The individual fits may be from subset linear regression, ridge...

Principal Curves Revisited (1992)

Robert Tibshirani

A principal curve (Hastie and Stuetzle, 1989) is a smooth curve passing through the "middle" of a distribution or data cloud, and is a generalization of linear principal components. We give...

Generalized Additive Models (1990)

Trevor Hastie, Robert Tibshirani

This article describes flexible statistical methods that may be used to identify and characterize the effect of potential prognostic factors on an outcome variable. These methods are called...

Linear smoothers and additive models (1989)

Andreas Buja, Trevor Hastie, Robert Tibshirani, Andreas Buja, Robert Tibshirani

We study linear smoothers and their use in building non-parametric regression models. In part Qfthis paper we examine certain aspects of linear smoothers for scatterplots; examples of these are the...

Variance stabilization and the bootstrap (1988)

TIBSHIRANI, ROBERT

We investigate the use of a variance stabilizing transformation for the computation of a bootstrap t confidence interval. The transformation is estimated in an ‘automatic’ manner through an...

Significance analysis of microarrays applied to the ionizing radiation response

Tusher, Virginia Goss, Tibshirani, Robert, Chu, Gilbert

Microarrays can measure the expression of thousands of genes to identify changes in expression between different biological states. Methods are needed to determine the significance of these changes...

Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications

Sørlie, Therese, Perou, Charles M., Tibshirani, Robert, Aas, Turid, Geisler, Stephanie, Johnsen, Hilde, ...

The purpose of this study was to classify breast carcinomas based on variations in gene expression patterns derived from cDNA microarrays and to correlate tumor characteristics to clinical outcome. A...

Diagnosis of multiple cancer types by shrunken centroids of gene expression

Tibshirani, Robert, Hastie, Trevor, Narasimhan, Balasubramanian, Chu, Gilbert

We have devised an approach to cancer class prediction from gene expression profiling, based on an enhancement of the simple nearest prototype (centroid) classifier. We shrink the prototypes and...

Transcriptional programs activated by exposure of human prostate cancer cells to androgen

DePrimo, Samuel E, Diehn, Maximilian, Nelson, Joel B, Reiter, Robert E, Matese, John, Fero, Mike, ...

DNA microarrays were used to examine the temporal program of gene expression following treatment of a human prostate cancer cell line with androgen. Significant changes in levels of transcripts of...

Microarray analysis reveals a major direct role of DNA copy number alteration in the transcriptional program of human breast tumors

Pollack, Jonathan R., Sørlie, Therese, Perou, Charles M., Rees, Christian A., Jeffrey, Stefanie S., Lonning, Per E., ...

Genomic DNA copy number alterations are key genetic events in the development and progression of human cancers. Here we report a genome-wide microarray comparative genomic hybridization (array CGH)...

Repeated observation of breast tumor subtypes in independent gene expression data sets

Sørlie, Therese, Tibshirani, Robert, Parker, Joel, Hastie, Trevor, Marron, J. S., Nobel, Andrew, ...

Characteristic patterns of gene expression measured by DNA microarrays have been used to classify tumors into clinically relevant subgroups. In this study, we have refined the previously defined...

Statistical significance for genomewide studies

Storey, John D., Tibshirani, Robert

With the increase in genomewide experiments and the sequencing of multiple genomes, the analysis of large data sets has become commonplace in biology. It is often the case that thousands of features...

Gene Expression Patterns in Ovarian Carcinomas

Schaner, Marci E., Ross, Douglas T., Ciaravino, Giuseppe, Sørlie, Therese, Troyanskaya, Olga, Diehn, Maximilian, ...

We used DNA microarrays to characterize the global gene expression patterns in surface epithelial cancers of the ovary. We identified groups of genes that distinguished the clear cell subtype from...

Gene expression profiling identifies clinically relevant subtypes of prostate cancer

Lapointe, Jacques, Li, Chunde, Higgins, John P., Van De Rijn, Matt, Bair, Eric, Montgomery, Kelli, ...

Prostate cancer, a leading cause of cancer death, displays a broad range of clinical behavior from relatively indolent to aggressive metastatic disease. To explore potential molecular variation...

Semi-Supervised Methods to Predict Patient Survival from Gene Expression Data

Bair, Eric, Tibshirani, Robert

An important goal of DNA microarray research is to develop tools to diagnose cancer more accurately based on the genetic profile of a tumor. There are several existing techniques in the literature...

Toxicity from radiation therapy associated with abnormal transcriptional responses to DNA damage

Rieger, Kerri E., Hong, Wan-Jen, Tusher, Virginia Goss, Tang, Jean, Tibshirani, Robert, Chu, Gilbert

Toxicity from radiation therapy is a grave problem for cancer patients. We hypothesized that some cases of toxicity are associated with abnormal transcriptional responses to radiation. We used...

Gene Expression Profiling Predicts Survival in Conventional Renal Cell Carcinoma

Zhao, Hongjuan, Ljungberg, Börje, Grankvist, Kjell, Rasmuson, Torgny, Tibshirani, Robert, Brooks, James D

Molecular heterogeneity of renal cell carcinomas can be used to distinguish subgroups that correlated with long term survival.

Gene expression profiling differentiates germ cell tumors from other cancers and defines subtype-specific signatures

Juric, Dejan, Sale, Sanja, Hromas, Robert A., Yu, Ron, Wang, Yan, Duran, George E., ...

Germ cell tumors (GCTs) of the testis are the predominant cancer among young men. We analyzed gene expression profiles of 50 GCTs of various subtypes, and we compared them with 443 other common...

Array-Based Comparative Genomic Hybridization Identifies Localized DNA Amplifications and Homozygous Deletions in Pancreatic Cancer1*

Bashyam, Murali D, Bair, Ryan, Kim, Young H, Wang, Pei, Hernandez-Boussard, Tina, Karikari, Collins A, ...

Pancreatic cancer, the fourth leading cause of cancer death in the United States, is frequently associated with the amplification and deletion of specific oncogenes and tumor-suppressor genes (TSGs),...

Significance analysis of microarrays applied to the ionizing radiation response

Tusher, Virginia Goss, Tibshirani, Robert, Chu, Gilbert

Microarrays can measure the expression of thousands of genes to identify changes in expression between different biological states. Methods are needed to determine the significance of these changes...

Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications

Sørlie, Therese, Perou, Charles M., Tibshirani, Robert, Aas, Turid, Geisler, Stephanie, Johnsen, Hilde, ...

The purpose of this study was to classify breast carcinomas based on variations in gene expression patterns derived from cDNA microarrays and to correlate tumor characteristics to clinical outcome. A...

Diagnosis of multiple cancer types by shrunken centroids of gene expression

Tibshirani, Robert, Hastie, Trevor, Narasimhan, Balasubramanian, Chu, Gilbert

We have devised an approach to cancer class prediction from gene expression profiling, based on an enhancement of the simple nearest prototype (centroid) classifier. We shrink the prototypes and...

Transcriptional programs activated by exposure of human prostate cancer cells to androgen

DePrimo, Samuel E, Diehn, Maximilian, Nelson, Joel B, Reiter, Robert E, Matese, John, Fero, Mike, ...

DNA microarrays were used to examine the temporal program of gene expression following treatment of a human prostate cancer cell line with androgen. Significant changes in levels of transcripts of...

Microarray analysis reveals a major direct role of DNA copy number alteration in the transcriptional program of human breast tumors

Pollack, Jonathan R., Sørlie, Therese, Perou, Charles M., Rees, Christian A., Jeffrey, Stefanie S., Lonning, Per E., ...

Genomic DNA copy number alterations are key genetic events in the development and progression of human cancers. Here we report a genome-wide microarray comparative genomic hybridization (array CGH)...

Repeated observation of breast tumor subtypes in independent gene expression data sets

Sørlie, Therese, Tibshirani, Robert, Parker, Joel, Hastie, Trevor, Marron, J. S., Nobel, Andrew, ...

Characteristic patterns of gene expression measured by DNA microarrays have been used to classify tumors into clinically relevant subgroups. In this study, we have refined the previously defined...

Statistical significance for genomewide studies

Storey, John D., Tibshirani, Robert

With the increase in genomewide experiments and the sequencing of multiple genomes, the analysis of large data sets has become commonplace in biology. It is often the case that thousands of features...

Gene Expression Patterns in Ovarian Carcinomas

Schaner, Marci E., Ross, Douglas T., Ciaravino, Giuseppe, Sørlie, Therese, Troyanskaya, Olga, Diehn, Maximilian, ...

We used DNA microarrays to characterize the global gene expression patterns in surface epithelial cancers of the ovary. We identified groups of genes that distinguished the clear cell subtype from...

Gene expression profiling identifies clinically relevant subtypes of prostate cancer

Lapointe, Jacques, Li, Chunde, Higgins, John P., Van De Rijn, Matt, Bair, Eric, Montgomery, Kelli, ...

Prostate cancer, a leading cause of cancer death, displays a broad range of clinical behavior from relatively indolent to aggressive metastatic disease. To explore potential molecular variation...

Semi-Supervised Methods to Predict Patient Survival from Gene Expression Data

Bair, Eric, Tibshirani, Robert

An important goal of DNA microarray research is to develop tools to diagnose cancer more accurately based on the genetic profile of a tumor. There are several existing techniques in the literature...

Toxicity from radiation therapy associated with abnormal transcriptional responses to DNA damage

Rieger, Kerri E., Hong, Wan-Jen, Tusher, Virginia Goss, Tang, Jean, Tibshirani, Robert, Chu, Gilbert

Toxicity from radiation therapy is a grave problem for cancer patients. We hypothesized that some cases of toxicity are associated with abnormal transcriptional responses to radiation. We used...

Robustness, scalability, and integration of a wound-response gene expression signature in predicting breast cancer survival

Chang, Howard Y., Nuyten, Dimitry S. A., Sneddon, Julie B., Hastie, Trevor, Tibshirani, Robert, Sørlie, Therese, ...

Based on the hypothesis that features of the molecular program of normal wound healing might play an important role in cancer metastasis, we previously identified consistent features in the...

Gene Expression Profiling Predicts Survival in Conventional Renal Cell Carcinoma

Zhao, Hongjuan, Ljungberg, Börje, Grankvist, Kjell, Rasmuson, Torgny, Tibshirani, Robert, Brooks, James D

Molecular heterogeneity of renal cell carcinomas can be used to distinguish subgroups that correlated with long term survival.

Gene expression profiling differentiates germ cell tumors from other cancers and defines subtype-specific signatures

Juric, Dejan, Sale, Sanja, Hromas, Robert A., Yu, Ron, Wang, Yan, Duran, George E., ...

Germ cell tumors (GCTs) of the testis are the predominant cancer among young men. We analyzed gene expression profiles of 50 GCTs of various subtypes, and we compared them with 443 other common...

Array-Based Comparative Genomic Hybridization Identifies Localized DNA Amplifications and Homozygous Deletions in Pancreatic Cancer1*

Bashyam, Murali D, Bair, Ryan, Kim, Young H, Wang, Pei, Hernandez-Boussard, Tina, Karikari, Collins A, ...

Pancreatic cancer, the fourth leading cause of cancer death in the United States, is frequently associated with the amplification and deletion of specific oncogenes and tumor-suppressor genes (TSGs),...

Discovery and validation of breast cancer subtypes

Kapp, Amy V, Jeffrey, Stefanie S, Langerød, Anita, Børresen-Dale, Anne-Lise, Han, Wonshik, Noh, Dong-Young, ...

Following the publication of our recent article (Kapp et al., BMC Genomics 2006, 7:231), we (the authors) regrettably found several errors in the published Table 5. This correction article not only...

Sparsity and smoothness via the fused lasso

Robert Tibshirani, Michael Saunders, Saharon Rosset, Ji Zhu, Keith Knight

The lasso penalizes a least squares regression by the sum of the absolute values ("L"1-norm) of the coefficients. The form of this penalty encourages sparse solutions (with many coefficients equal to...

IRF9 and STAT1 are required for IgG autoantibody production and B cell expression of TLR7 in mice

Thibault, Donna L., Chu, Alvina D., Graham, Kareem L., Balboni, Imelda, Lee, Lowen Y., Kohlmoos, Cassidy, ...

A hallmark of SLE is the production of high-titer, high-affinity, isotype-switched IgG autoantibodies directed against nucleic acid–associated antigens. Several studies have established a role for...

Pre-validation and inference in microarrays

Robert Tibshirani, Brad Efron

In microarray studies, an important problem is to compare a predictor of disease outcome derived from gene expression levels to standard clinical predictors. Comparing them on the same dataset that...

Notch signals positively regulate activity of the mTOR pathway in T-cell acute lymphoblastic leukemia

Chan, Steven M., Weng, Andrew P., Tibshirani, Robert, Aster, Jon C.

Constitutive Notch activation is required for the proliferation of a subgroup of T-cell acute lymphoblastic leukemia (T-ALL). Downstream pathways that transmit pro-oncogenic signals are not well...

Alteration of Gene Expression Signatures of Cortical Differentiation and Wound Response in Lethal Clear Cell Renal Cell Carcinomas

Zhao, Hongjuan, Zongming Ma,, Tibshirani, Robert, Higgins, John P. T., Ljungberg, Börje, Brooks, James D.

Clear cell renal cell carcinoma (ccRCC) is the most common malignancy of the adult kidney and displays heterogeneity in clinical outcomes. Through comprehensive gene expression profiling, we have...

Covariance-regularized regression and classification for high dimensional problems

Daniela M. Witten, Robert Tibshirani

We propose covariance-regularized regression, a family of methods for prediction in high dimensional settings that uses a shrunken estimate of the inverse covariance matrix of the features to achieve...

Univariate Shrinkage in the Cox Model for High Dimensional Data

Robert Tibshirani

We propose a method for prediction in Cox's proportional model, when the number of features (regressors), p, exceeds the number of observations, n. The method assumes that the features are...

Extensions of Sparse Canonical Correlation Analysis with Applications to Genomic Data

Daniela Witten, Robert Tibshirani

In recent work, several authors have introduced methods for sparse canonical correlation analysis (sparse CCA). Suppose that two sets of measurements are available on the same set of observations....

Disease signatures are robust across tissues and experiments

Dudley, Joel T, Tibshirani, Robert, Deshpande, Tarangini, Butte, Atul J

Meta-analyses combining gene expression microarray experiments offer new insights into the molecular pathophysiology of disease not evident from individual experiments. Although the established...

Boolean implication networks derived from large scale, whole genome microarray datasets

Sahoo, Debashis, Dill, David L, Gentles, Andrew J, Tibshirani, Robert, Plevritis, Sylvia K

A method for analysis of microarray data is presented that extracts statistically significant Boolean implication relationships between pairs of genes.