A bias correction for the minimum error rate in cross-validation (2009)
Tibshirani, Ryan J., Tibshirani, Robert
Tuning parameters in supervised learning problems are often estimated by cross-validation. The minimum value of the cross-validation error can be biased downward as an estimate of the test error at...
Classification by Set Cover: The Prototype Vector Machine (2009)
Bien, Jacob, Tibshirani, Robert
We introduce a new nearest-prototype classifier, the prototype vector machine (PVM). It arises from a combinatorial optimization problem which we cast as a variant of the set cover problem. We...
Transposable Regularized Covariance Models with an Application to Missing Data Imputation (2009)
Allen, Genevera I., Tibshirani, Robert
Missing data estimation is an important challenge with high-dimensional data arranged in the form of a matrix. Typically this data is transposable, meaning that either the rows, columns or both can...
Trevor Hastie, Robert Tibshirani, Saharon Rosset, Ji Zhu
In this paper we argue that the choice of the SVM cost parameter can be critical. We then derive an algorithm that can fit the entire path of SVM solutions for every value of the cost parameter, with...
Response to Mease and Wyner, Evidence Contrary to the Statistical View (2009)
Jerome Friedman, Trevor Hastie, Robert Tibshirani, Yoav Freund
This is an interesting and thought-provoking paper. We especially appreciate the fact that the authors have supplied R code for their examples, as this allows the reader to understand and assess...
Witten, Daniela M., Tibshirani, Robert, Hastie, Trevor
We present a penalized matrix decomposition (PMD), a new framework for computing a rank-K approximation for a matrix. We approximate the matrix X as , where dk, uk, and vk minimize the squared...
4. Natural Cubic Splines (2008)
Trevor Hastie, Robert Tibshirani, Bágyi Ibolya
Splines and applications
© Institute of Mathematical Statistics, 2004 LEAST ANGLE REGRESSION (2008)
Bradley Efron, Trevor Hastie, Iain Johnstone, Robert Tibshirani
The purpose of model selection algorithms such as All Subsets, Forward Selection and Backward Elimination is to choose a linear model on the basis of the same set of data to which the model will be...
Testing significance of features by lassoed principal components (2008)
Witten, Daniela M., Tibshirani, Robert
We consider the problem of testing the significance of features in high-dimensional settings. In particular, we test for differentially-expressed genes in a microarray experiment. We wish to identify...
Boolean implication networks derived from large scale, whole genome microarray datasets (2008)
Sahoo, Debashis, Dill, David L, Gentles, Andrew J, Tibshirani, Robert, Plevritis, Sylvia K
Abstract We describe a method for extracting Boolean implications (if-then relationships) in very large amounts of gene expression microarray data. A meta-analysis of data from thousands of...
© Institute of Mathematical Statistics, 2004 LEAST ANGLE REGRESSION (2008)
Bradley Efron, Trevor Hastie, Iain Johnstone, Robert Tibshirani
The purpose of model selection algorithms such as All Subsets, Forward Selection and Backward Elimination is to choose a linear model on the basis of the same set of data to which the model will be...
Discussion of: Treelets--An adaptive multi-scale basis for sparse unordered data (2008)
Discussion of "Treelets--An adaptive multi-scale basis for sparse unordered data" [arXiv:0707.0481]
A study of pre-validation (2008)
Höfling, Holger, Tibshirani, Robert
Given a predictor of outcome derived from a high-dimensional dataset, pre-validation is a useful technique for comparing it to competing predictors on the same dataset. For microarray data, it allows...
Complementary hierarchical clustering (2008)
When applying hierarchical clustering algorithms to cluster patient samples from microarray data, the clustering patterns generated by most algorithms tend to be dominated by groups of highly...
Trevor Hastie, Saharon Rosset, Robert Tibshirani, Ji Zhu, Ji Zhu
Abstract. In this paper we argue that the choice of the SVM cost parameter can be critical. We then derive an algorithm that can fit the entire path of SVM solutions for every value of the cost...
Margin trees for highdimensional classification (2008)
Robert Tibshirani, Trevor Hastie, Dale Schurmanns
We propose a method for the classification of more than two classes, from high-dimensional features. Our approach is to build a binary decision tree in a top-down manner, using the optimal margin...
Discussion: The Dantzig selector: Statistical estimation when $p$ is much larger than $n$ (2008)
Efron, Bradley, Hastie, Trevor, Tibshirani, Robert
Discussion of ``The Dantzig selector: Statistical estimation when $p$ is much larger than $n$'' [math/0506081]
Margin trees for highdimensional classification (2008)
Robert Tibshirani, Trevor Hastie
We propose a method for the classification of more than two classes, from high-dimensional features. Our approach is to build a binary decision tree in a top-down manner, using the optimal margin...
Predicting Patient Survival (2008)
An important goal of DNA microarray research is to develop tools to diagnose cancer more accurately based on the genetic profile of a tumor. There are several existing techniques in the literature...
Margin trees for high-dimensional classification Robert Tibshirani and Trevor Hastie + (2008)
Febr Ua Ry, Robert Tibshirani, Trevor Hastie
We propose a method for the classification of more than two classes, from high-dimensional features. Our approach is to build a binary decision tree in a top-down manner, using the optimal margin...
Eric Ba Ir, Robert Tibshirani, Eric Bair
We propose a method for detecting dierential gene expression that makes use of the singular value decomposition of the matrix of expression values. It looks for biological variation that correlates...
The "Miss rate" for the analysis of gene (2008)
Expression Data Jonathan, Jonathan Taylor, Robert Tibshirani, Bradley Efron
Multiple testing issues are important in gene expression studies, where typically thousands of genes are compared over two or more experimental conditions. The false discovery rate has become a...
Pre-conditioning” for feature selection and regression in high-dimensional problems (2008)
Debashis Paul, Eric Bair, Trevor Hastie, Robert Tibshirani
We consider regression problems where the number of predictors greatly exceeds the number of observations. We propose a method for
Pre-conditioning” for feature selection and regression in high-dimensional problems (2008)
Debashis Paul, Eric Bair, Trevor Hastie, Robert Tibshirani
We consider regression problems where the number of predictors greatly exceeds the number of observations. We propose a method for
Spatial smoothing and hot spot detection for CGH data using the fused lasso (2008)
We apply the “fused lasso” regression method of (TSRZ2004) to the problem of “hot- spot detection”, in particular, detection of regions of gain or loss in comparative genomic hybridization...
Sparse inverse covariance estimation with the graphical lasso (2008)
Friedman, Jerome, Hastie, Trevor, Tibshirani, Robert
We consider the problem of estimating sparse graphs by a lasso penalty applied to the inverse covariance matrix. Using a coordinate descent procedure for the lasso, we develop a simple...
Complementary hierarchical clustering (2008)
Nowak, Gen, Tibshirani, Robert
When applying hierarchical clustering algorithms to cluster patient samples from microarray data, the clustering patterns generated by most algorithms tend to be dominated by groups of highly...
On the "degrees of freedom" of the lasso (2007)
Zou, Hui, Hastie, Trevor, Tibshirani, Robert
We study the effective degrees of freedom of the lasso in the framework of Stein's unbiased risk estimation (SURE). We show that the number of nonzero coefficients is an unbiased estimate for the...
Health Research & Policy, and Statistics, (2007)
Robert Tibshirani, Patrick Brown, Incyte Brown Lab
Statistical challenges in the analysis of DNA microarray data
A comparison of statistical learning methods on the GUSTO database (2007)
Marguerite Ennis, Geoffrey Hinton, David Naylor, Mike Revow, Robert Tibshirani
A battery of modern, adaptive nonlinear learning methods is applied to a large real database of cardiac patient data. Each method is used to predict 30 day mortality from a large number of potential...
Donald A. Redelmeier, Robert Tibshirani
We describe the analysis of some matched pair binary data arising from a study designed to investigate whether cellular telephones are associated with automobile collisions. Conditional and random...
Ys Is, Donald A. Redelmeier, Robert Tibshirani
We describe the analysis of some matched pair binary data arising from a study designed to investigate whether cellular telephones are associated with motor vehicle collisions. Conditional and random...
Who is the Fastest Man in the World? (2007)
I compare the world record sprint races of Donovan Bailey and Michael Johnson in the 1996 Olympic games and try to answer the questions a) who is faster?, b) which performance was more remarkable ?...
Trevor Hastie, Robert Tibshirani, Michael B Eisen, Ash Alizadeh, Ronald Levy, Louis Staudt, ...
shaving ’ as a method for identifying distinct sets of genes
Microarrays and Their Use in a Comparative Experiment (2007)
Bradley Elton, Robert Tibshirani, Virginia Goss, Sand Gil Chu
Microarrays enable genetic researchers to measure expression levels for thousands of genes simultaneously. At least that's the idea. In fact the gene expression information arrives in highly...
Trevor Hastie, Robert Tibshirani, Saharon Rosset, Ji Zhu
In this paper we argue that the choice of the SVM cost parameter can be critical. We then derive an algorithm that can fit the entire path of SVM solutions for every value of the cost parameter, with...
Semi-Supervised Methods to Predict Patient (2007)
Survival From Gene, Eric Bair, Robert Tibshirani
this paper concerns di#use large B-cell lymphoma (DLBCL). This is the most common type of lymphoma in adults, and it can only be treated by chemotherapy in approximately 40% of patients (Coi#er 2001;...
Sparse inverse covariance estimation with the lasso (2007)
Friedman, Jerome, Hastie, Trevor, Tibshirani, Robert
We consider the problem of estimating sparse graphs by a lasso penalty applied to the inverse covariance matrix. Using a coordinate descent procedure for the lasso, we develop a simple algorithm that...
Pathwise coordinate optimization (2007)
Friedman, Jerome, Hastie, Trevor, Höfling, Holger, Tibshirani, Robert
We consider ``one-at-a-time'' coordinate-wise descent algorithms for a class of convex optimization problems. An algorithm of this kind has been proposed for the $L_1$-penalized regression (lasso) in...
Forward stagewise regression and the monotone lasso (2007)
Hastie, Trevor, Taylor, Jonathan, Tibshirani, Robert, Walther, Guenther
We consider the least angle regression and forward stagewise algorithms for solving penalized least squares regression problems. In Efron, Hastie, Johnstone & Tibshirani (2004) it is proved that the...
Correction: Discovery and validation of breast cancer subtypes (2007)
Kapp, Amy V, Jeffrey, Stefanie S, Langerød, Anita, Børresen-Dale, Anne-Lise, Han, Wonshik, Noh, Dong-Young, ...
Abstract Following the publication of our recent article (Kapp et al. , BMC Genomics 2006 , 7:231), we (the authors) regrettably found several errors in the published Table 5. This correction article...
Averaged gene expressions for regression (2007)
Park, Mee Young, Hastie, Trevor, Tibshirani, Robert
Although averaging is a simple technique, it plays an important role in reducing variance. We use this essential property of averaging in regression of the DNA microarray data, which poses the...
"Pre-conditioning" for feature selection and regression in high-dimensional problems (2007)
Paul, Debashis, Bair, Eric, Hastie, Trevor, Tibshirani, Robert
We consider regression problems where the number of predictors greatly exceeds the number of observations. We propose a method for variable selection that first estimates the regression function,...
Jerome Friedman, Trevor Hastie, Robert Tibshirani
inverse covariance estimation with the
Holger Höfling, Gad Getz, Robert Tibshirani
Identifying genes that are involved in the development of cancer has been a very important research goal. One approach tries to identify genes in tumors that have an increased mutation rate....
Discussion of “the Dantzig selector (2007)
Bradley Efron, Trevor Hastie, Robert Tibshirani
This is a fascinating paper on an important topic: the choice of predictor variables in large-scale linear models. A previous paper in these pages attacked the same problem using the “LARS ”...
Forward stagewise regression and the monotone lasso (2007)
Trevor Hastie, Jonathan Taylor, Robert Tibshirani, Guenther Walther
We consider the least angle regression and forward stagewise algorithms for solving penalized least squares regression problems. In Efron et al. (2004) it is proven that the least angle regression...
Forward stagewise regression and the monotone lasso (2007)
Trevor Hastie, Jonathan Taylor, Robert Tibshirani, Guenther Walther
We consider the least angle regression and forward stagewise algorithms for solving penalized least squares regression problems. In Efron et al. (2004) it is proven that the least angle regression...
Jerome Friedman, Trevor Hastie, Robert Tibshirani
inverse covariance estimation with the lasso
Pathwise coordinate optimization (2007)
Jerome Friedman, Trevor Hastie, Holger Höfling, Robert Tibshirani
We consider “one-at-a-time ” coordinate-wise descent algorithms for a class of convex optimization problems. An algorithm of this kind has been proposed for the L1-penalized regression (lasso) in...
A STUDY OF PRE-VALIDATION (2007)
Holger Höfling, Robert Tibshirani
Pre-validation is a useful technique for the analysis of microarray and other high dimensional data. It allows one to derive a predictor for disease outcome and compare it to standard clinical...
Pathwise coordinate optimization (2007)
Jerome Friedman, Trevor Hastie, Holger Höfling, Robert Tibshirani
We consider “one-at-a-time ” coordinate-wise descent algorithms for a class of convex optimization problems. An algorithm of this kind has been proposed for the L1-penalized regression (lasso) in...
Averaged gene expressions for regression (2007)
Park, Mee Young, Hastie, Trevor, Tibshirani, Robert
Although averaging is a simple technique, it plays an important role in reducing variance. We use this essential property of averaging in regression of the DNA microarray data, which poses the...
Disease-Specific Genomic Analysis: Identifying the Signature of Pathologic Biology (2007)
Nicolau, Monica, Tibshirani, Robert, Børresen-Dale, Anne-Lise, Jeffrey, Stefanie S.
Motivation: Genomic high-throughput technology generates massive data, providing opportunities to understand countless facets of the functioning genome. It also raises profound issues in identifying...
Outlier sums for differential gene expression analysis (2007)
Tibshirani, Robert, Hastie, Trevor
We propose a method for detecting genes that, in a disease group, exhibit unusually high gene expression in some but not all samples. This can be particularly useful in cancer studies, where...
Are clusters found in one dataset present in another dataset? (2007)
Kapp, Amy V., Tibshirani, Robert
In many microarray studies, a cluster defined on one dataset is sought in an independent dataset. If the cluster is found in the new dataset, the cluster is said to be “reproducible” and may be...
Regularized linear discriminant analysis and its application in microarrays (2007)
Guo, Yaqian, Hastie, Trevor, Tibshirani, Robert
In this paper, we introduce a modified version of linear discriminant analysis, called the “shrunken centroids regularized discriminant analysis” (SCRDA). This method generalizes the idea of the...
On testing the significance of sets of genes (2006)
Efron, Bradley, Tibshirani, Robert
This paper discusses the problem of identifying differentially expressed groups of genes from a microarray experiment. The groups of genes are externally defined, for example, sets of gene pathways...
Discovery and validation of breast cancer subtypes (2006)
Kapp, Amy V, Jeffrey, Stefanie S, Langerød, Anita, Børresen-Dale, Anne-Lise, Han, Wonshik, Noh, Dong-Young, ...
Abstract Background Previous studies demonstrated breast cancer tumor tissue samples could be classified into different subtypes based upon DNA microarray profiles. The most recent study presented...
Correlation-sharing for detection of differential gene expression (2006)
Tibshirani, Robert, Wasserman, Larry
We propose a method for detecting differential gene expression that exploits the correlation between genes. Our proposal averages the univariate scores of each feature with the scores in correlation...
A simple method for assessing sample sizes in microarray experiments (2006)
Abstract Background In this short article, we discuss a simple method for assessing sample size requirements in microarray experiments. Results Our method starts with the output from a...
Gene Expression Profiling Predicts Survival in Conventional Renal Cell Carcinoma (2006)
Hongjuan Zhao, Börje Ljungberg, Kjell Grankvist, Torgny Rasmuson, Robert Tibshirani, James D. Brooks
Molecular heterogeneity of renal cell carcinomas can be used to distinguish subgroups that correlated with long term survival.
Gene Expression Profiling Predicts Survival in Conventional Renal Cell Carcinoma (2006)
Hongjuan Zhao, Börje Ljungberg, Kjell Grankvist, Torgny Rasmuson, Robert Tibshirani, James D. Brooks
Background Conventional renal cell carcinoma (cRCC) accounts for most of the deaths due to kidney cancer. Tumor stage, grade, and patient performance status are used currently to predict survival...
Prediction by supervised principal components (2006)
Eric Bair, Debashis Paul, Robert Tibshirani
In regression problems where the number of predictors greatly exceeds the number of observations, conventional regression techniques may produce unsatisfactory results. We describe a technique called...
Forward Stagewise Regression and the Monotone Lasso (2006)
Trevor Hastie, Jonathan Taylor, Robert Tibshirani, Guenther Walther
We consider the least angle regression and forward stagewise algorithms for solving penalized least squares regression problems. In Efron et al. (2004) it is proven that the least angle regression...
Correlation-sharing for detection of differential gene (2006)
Robert Tibshirani, Larry Wasserman
expression
On testing the significance of sets of genes (2006)
Bradley Efron, Robert Tibshirani
This paper discusses the problem of identifying differentially expressed groups of genes from a microarray experiment. The groups of genes are externally defined, for example, sets of gene pathways...
Prediction by supervised principal components (2006)
Eric Bair, Trevor Hastie, Debashis Paul, Robert Tibshirani
In regression problems where the number of predictors greatly exceeds the number of observations, conventional regression techniques may produce unsatisfactory results. We describe a technique called...
Prediction by supervised principal components (2006)
Eric Bair, Trevor Hastie, Debashis Paul, Robert Tibshirani
In regression problems where the number of predictors greatly exceeds the number of observations, conventional regression techniques may produce unsatisfactory results. We describe a technique called...
Correlation-sharing for detection of differential gene (2006)
Robert Tibshirani, Larry Wasserman
expression
A tail strength measure for assessing the overall univariate significance in a dataset (2006)
Taylor, Jonathan, Tibshirani, Robert
We propose an overall measure of significance for a set of hypothesis tests. The ‘tail strength’ is a simple function of the p-values computed for each of the tests. This measure is useful, for...
Hybrid hierarchical clustering with applications to microarray data (2006)
Chipman, Hugh, Tibshirani, Robert
In this paper, we propose a hybrid clustering method that combines the strengths of bottom-up hierarchical clustering with that of top-down clustering. The first method is good at identifying small...
Are clusters found in one dataset present in another dataset? (2006)
Kapp, Amy V., Tibshirani, Robert
In many microarray studies, a cluster defined on one dataset is sought in an independent dataset. If the cluster is found in the new dataset, the cluster is said to be reproducible and may be...
Regularized linear discriminant analysis and its application in microarrays (2006)
Guo, Yaqian, Hastie, Trevor, Tibshirani, Robert
In this paper, we introduce a modified version of linear discriminant analysis, called the "shrunken centroids regularized discriminant analysis" (SCRDA). This method generalizes the idea of the...
Averaged gene expressions for regression (2006)
Park, Mee Young, Hastie, Trevor, Tibshirani, Robert
Although averaging is a simple technique, it plays an important role in reducing variance. We use this essential property of averaging in regression of the DNA microarray data, which poses the...
Outlier sums for differential gene expression analysis (2006)
Tibshirani, Robert, Hastie, Trevor
We propose a method for detecting genes that, in a disease group, exhibit unusually high or gene expression in some but not all samples. This can be particularly useful in cancer studies, where...
Early detection of breast cancer based on gene-expression patterns in peripheral blood cells (2005)
Sharma, Praveen, Sahni, Narinder S, Tibshirani, Robert, Skaane, Per, Urdal, Petter, Berghagen, Hege, ...
Abstract Introduction Existing methods to detect breast cancer in asymptomatic patients have limitations, and there is a need to develop more accurate and convenient methods. In this study, we...
A method for calling gains and losses in array CGH data (2005)
Pei Wang, Young Kim, Jonathan Pollack, Robert Tibshirani
Array CGH is a powerful technique for genomic studies of cancer. It enables one to carry out genomewide screening for regions of genetic alterations, such as chromosome gains and losses, or localized...
univariate significance (2005)
Jonathan Taylor, Robert Tibshirani
tail strength measure for assessing the overall
Jonathan Taylor, Robert Tibshirani
tail strength measure for assessing the overall
Jonathan Taylor, Robert Tibshirani
tail strength measure for assessing the overall
The 'miss rate' for the analysis of gene expression data (2005)
Taylor, Jonathan, Tibshirani, Robert, Efron, Bradley
Multiple testing issues are important in gene expression studies, where typically thousands of genes are compared over two or more experimental conditions. The false discovery rate has become a...
A method for calling gains and losses in array CGH data (2005)
Wang, Pei, Kim, Young, Pollack, Jonathan, Narasimhan, Balasubramanian, Tibshirani, Robert
Array CGH is a powerful technique for genomic studies of cancer. It enables one to carry out genome-wide screening for regions of genetic alterations, such as chromosome gains and losses, or...
Hybrid Hierarchical Clustering with Applications to Microarray Data (2005)
Chipman, Hugh, Tibshirani, Robert
In this paper we propose a hybrid clustering method that combines the strengths of bottom-up hierarchical clustering with that of top-down clustering. The first method is good at identifying small...
A tail strength measure for assessing the overall univariate significance in a dataset (2005)
Taylor, Jonathan, Tibshirani, Robert
We propose an overall measure of significance for a set of hypothesis tests. The tail strength is a simple function of the p-values computed for each of the tests. This measure is useful, for...
Efron, Bradley, Hastie, Trevor, Johnstone, Iain, Tibshirani, Robert
The purpose of model selection algorithms such as All Subsets, Forward Selection and Backward Elimination is to choose a linear model on the basis of the same set of data to which the model will be...
Rejoinder to "Least angle regression" by Efron et al (2004)
Efron, Bradley, Hastie, Trevor, Johnstone, Iain, Tibshirani, Robert
Rejoinder to ``Least angle regression'' by Efron et al. [math.ST/0406456]
Efron, Bradley, Hastie, Trevor, Johnstone, Iain, Tibshirani, Robert
The purpose of model selection algorithms such as All Subsets, Forward Selection and Backward Elimination is to choose a linear model on the basis of the same set of data to which the model will be...
Semi-Supervised Methods to Predict Patient Survival from Gene Expression Data (2004)
Procedures that utilize both gene expression data and clinical data to identify subtypes of cancer can provide more accurate prognoses.
Semi-Supervised Methods to Predict Patient Survival from Gene Expression Data (2004)
An important goal of DNA microarray research is to develop tools to diagnose cancer more accurately based on the genetic profile of a tumor. There are several existing techniques in the literature...
Cancer characterization and feature set extraction by discriminative margin clustering (2004)
Munagala, Kamesh, Tibshirani, Robert, Brown, Patrick O
Abstract Background A central challenge in the molecular diagnosis and treatment of cancer is to define a set of molecular features that, taken together, distinguish a given cancer, or type of...
Discussions of boosting papers, and rejoinders (2004)
Bartlett, Peter L., Bickel, Peter J., Bühlmann, Peter, Freund, Yoav, Friedman, Jerome, Hastie, Trevor, ...
Discussions of: "Process consistency for AdaBoost" [Ann. Statist. 32 (2004), no. 1, 13-29] by W. Jiang; "On the Bayes-risk consistency of regularized boosting methods" [ibid., 30-55] by G. Lugosi and...
Sparse principal component analysis (2004)
Hui Zou, Trevor Hastie, Robert Tibshirani
Principal component analysis (PCA) is widely used in data processing and dimensionality reduction. However, PCA suffers from the fact that each principal component is a linear combi-nation of all the...
Efficient quadratic regularization for expression arrays (2004)
Trevor Hastie, Robert Tibshirani
have been many attempts to adapt statistical models for regression and classification to these data, and in many cases these attempts have challenged the computational resources. In this article we...
The Entire Regularization Path for the Support Vector Machine (2004)
Trevor Hastie, Saharon Rosset, Robert Tibshirani, Ji Zhu, Nello Cristianini
The support vector machine (SVM) is a widely used tool for classification. Many efficient implementations exist for fitting a two-class SVM model. The user has to supply values for the tuning...
Sparse Principal Component Analysis (2004)
Hui Zou Trevor, Hui Zou, Trevor Hastie, Robert Tibshirani
Principal component analysis (PCA) is widely used in data processing and dimensionality reduction. However, PCA su#ers from the fact that each principal component is a linear combination of all the...
On the "Degrees of Freedom" of the Lasso (2004)
Hui Zou Trevor, Hui Zou, Trevor Hastie, Robert Tibshirani
We study the degrees of freedom of the Lasso in the framework of Stein's unbiased risk estimation (SURE). We show that the number of non-zero coe#cients is an unbiased estimate for the degrees...
Robert Tibshirani, Trevor Hastie, Balasubramanian Narasimhan, Scott Soltys, Gongyi Shi, Albert Koong, ...
Sample classification from protein mass spectrometry,
On the ”degrees of freedom” of the lasso (2004)
Hui Zou, Trevor Hastie, Robert Tibshirani
We study the effective degrees of freedom of the lasso in the framework of Stein’s unbiased risk estimation (SURE). We show that the number of nonzero coefficients is an unbiased estimate for the...
Bradley Efron, Trevor Hastie, Iain Johnstone, Robert Tibshirani
The purpose of model selection algorithms such as All Subsets, Forward Selection, and Backward Elimination is to choose a linear model on the basis of the same set of data to which the model will be...
Efficient Quadratic Regularization for Expression Arrays (2004)
Trevor Hastie, Robert Tibshirani
this article we expose a class of techniques based on quadratic regularization of linear models, including regularized (ridge) regression, logistic and multinomial regression, linear and mixture...
Sample classification from protein mass spectrometry, by "peak probability contrasts" (2004)
Tibshirani, Robert, Hastie, Trevor, Narasimhan, Balasubramanian, Soltys, Scott, Shi, Gongyi, Koong, Albert, ...
Motivation: Early cancer detection has always been a major research focus in solid tumor oncology. Early tumor detection can theoretically result in lower stage tumors, more treatable diseases and...
Sample classification from protein mass spectrometry, by 'peak probability contrasts' (2004)
Tibshirani, Robert, Hastie, Trevor, Narasimhan, Balasubramanian, Soltys, Scott, Shi, Gongyi, Koong, Albert, ...
Motivation: Early cancer detection has always been a major research focus in solid tumor oncology. Early tumor detection can theoretically result in lower stage tumors, more treatable diseases and...
Sample classification from protein mass spectrometry, by "peak probability contrasts" (2004)
Tibshirani, Robert, Hastie, Trevor, Narasimhan, Balasubramanian, Soltys, Scott, Shi, Gongyi, Koong, Albert, ...
Motivation: Early cancer detection has always been a major research focus in solid tumor oncology. Early tumor detection can theoretically result in lower stage tumors, more treatable diseases and...
Class Prediction by Nearest Shrunken Centroids, with Applications to DNA Microarrays (2003)
Tibshirani, Robert, Hastie, Trevor, Narasimhan, Balasubramanian, Chu, Gilbert
We propose a new method for class prediction in DNA microarray studies based on an enhancement of the nearest prototype classifier. Our technique uses "shrunken" centroids as prototypes for each...
Expression arrays and the n# p problem (2003)
Trevor Hastie, Robert Tibshirani
Gene expression arrays typically have 50 to 100 samples and 5,000 to 20,000 variables (genes). There have been many attempts to adapt statistical models for regression and classification to these...
Statistical significance for genome-wide experiments (2003)
John D. Storey, Robert Tibshirani
Abstract: With the increase in genome-wide experiments and the sequencing of multi-ple genomes, the analysis of large data sets has become commonplace in biology. It is often the case that thousands...
Transcriptional programs activated by exposure of human prostate cancer cells to androgen (2002)
DePrimo, Samuel E, Diehn, Maximilian, Nelson, Joel B, Reiter, Robert E, Matese, John, Fero, Mike, ...
Abstract Background Androgens are required for both normal prostate development and prostate carcinogenesis. We used DNA microarrays, representing approximately 18,000 genes, to examine the temporal...
The Bootstrap Method for Assessing Statistical Accuracy. (2002)
Efron,Bradley, Tibshirani,Robert
This is an invited review of bootstrap methods. It begins with an exposition of the bootstrap estimate of standard error for one-sample situations. Several examples, some involving quite complicated...
Bootstrap Confidence Intervals and Bootstrap Approximations. (2002)
DiCicco,Thomas, Tibshirani,Robert
This document studies the BC sub a bootstrap procedure for constructing parametric and non-parametric confidence intervals. The BC sub a interval relies on the existence of a transformation that maps...
Gil Chu, Balasubramanian Narasimhan, Robert Tibshirani, Virginia Tusher
Acknowledgments: We would like to thank the R core team for permission to use the R statistical system and Thomas Baier and Erich Neuwirth for permission to use the R DCOM server. Charlie Tibshirani...
Exploratory screening of genes and clusters from microarray experiments’, Statistica Sinica (2002)
Robert Tibshirani, Trevor Hastie, Balasubramanian Narasimhan, Michael Eisen, Gavin Sherlock, Pat Brownil
We discuss a method called "cluster scoring " for supervised learning from a set of gene expression experiments. Cluster scoring generalizes methods that rank individual genes based...
Empirical bayes methods and false discovery rates for microarrays (2002)
Bradley Efron, John D. Storey, Robert Tibshirani
In a classic two-sample problem one might use Wilcoxon's statistic to test for a dierence between Treatment and Control subjects. The analogous microarray experiment yields thousands of Wilcoxon...
Bradley Efron Trevor, Trevor Hastie, Lain Johnstone, Robert Tibshirani
The purpose of model selection algorithms such as All Subsets, Forward Selection, and Backward Elimination is to choose a linear model on the basis of the same set of data to which the model will be...
Supervised harvesting of expression trees (2001)
Hastie, Trevor, Tibshirani, Robert, Botstein, David, Brown, Patrick
Abstract Background We propose a new method for supervised learning from gene expression data. We call it 'tree harvesting'. This technique starts with a hierarchical clustering of genes, then models...
Cluster validation by prediction strength (2001)
Robert Tibshirani, Guenther Walther, David Botstein, Patrick Brown
We propose a new quantity for assessing the number of groups or clusters in a dataset. The key idea is to view clustering as a supervised classi cation problem, in which we must also estimate the...
Missing value estimation methods for DNA microarrays (2001)
Troyanskaya, Olga, Cantor, Michael, Sherlock, Gavin, Brown, Pat, Hastie, Trevor, Tibshirani, Robert, ...
Motivation: Gene expression microarray experiments can generate data sets with multiple missing expression values. Unfortunately, many algorithms for gene expression analysis require a complete...
Hastie, Trevor, Tibshirani, Robert, Eisen, Michael B, Alizadeh, Ash, Levy, Ronald, Staudt, Louis, ...
Abstract Background Large gene expression studies, such as those conducted using DNA arrays, often provide millions of different pieces of data. To address the problem of analyzing such data, we...
Bayesian backfitting (with comments and a rejoinder by the authors (2000)
Hastie, Trevor, Tibshirani, Robert
We propose general procedures for posterior sampling from additive and generalized additive models. The procedure is a stochastic generalization of the well-known backfitting algorithm for fitting...
Friedman, Jerome, Hastie, Trevor, Tibshirani, Robert
Boosting is one of the most important recent developments in classification methodology. Boosting works by sequentially applying a classification algorithm to reweighted versions of the training data...
Special invited paper. additive logistic regression: A statistical view of boosting (2000)
Jerome Friedman, Trevor Hastie, Robert Tibshirani
Boosting is one of the most important recent developments in classification methodology. Boosting works by sequentially applying a classification algorithm to reweighted versions of the training data...
Estimating the number of clusters in a dataset via the Gap statistic (2000)
Robert Tibshirani, Guenther Walther, Trevor Hastie
We propose a method (the \Gap statistic") for estimating the number of clusters (groups) in a set of data. The technique uses the output of any clustering algorithm (e.g. k-means or...
Supervised Harvesting of Expression Trees (2000)
Trevor Hastie, Robert Tibshirani, David Botstein, Patrick Brown
Background We propose a new method for supervising learning from gene expression data. We call it \Tree Harvesting". This technique starts with a hierarchical clustering of genes, and models the...
Comment Review Reports Deposited Research Interactions Information Refereed (2000)
Research Gene Shaving', Trevor Hastie, Robert Tibshirani, Michael B Eisen, Ash Alizadeh, Ronald Levy, ...
Background: Large gene expression studies, such as those conducted using DNA arrays, often provide millions of different pieces of data. To address the problem of analyzing such data, we describe a...
Imputing missing data for gene expression arrays (1999)
Trevor Hastie, Robert Tibshirani, Gavin Sherlock, Michael Eisen, Patrick Brown, David Botstein
Here we describe three dierent methods for imputation. The rst is based on a reduced rank SVD of the expression matrix, the second is based on K-nearest neighbor averaging, and the third is based on...
The Global Pairwise Approach to Radiation Hybrid Mapping (1999)
Robert Tibshirani, Laura Lazzeroni, Trevor Hastie, Adam Olshen, David Cox
Introduction We propose a global pairwise method for constructing maps from radiation hybrid data. The method depends upon a novel statistical criterion for identifying the best map than it more...
The Covariance Inflation Criterion for Adaptive Model Selection (1999)
Robert Tibshirani And Keith Knight, Robert Tibshirani, Keith Knight
We propose a new criterion for model selection in prediction problems. The covariance inflation criterion adjusts the training error by the average covariance of the predictions and responses, when...
Imputing missing data for gene expression arrays (1999)
Trevor Hastie, Robert Tibshirani, Gavin Sherlock, Michael Eisen, Patrick Brown, David Botstein
Here we describe three different methods for imputation.The first is based on a reduced rank SVD of the expression matrix, the second is based on K-nearest neighbor averaging, and the third is based...
Generalized Additive Models, Cubic Splines and Penalized Likelihood. (1998)
Hastie,Trevor, Tibshirani,Robert
Generalized additive models extended the class of generalized linear models by allowing an arbitrary smooth function for any or all of the covariates. The functions are established by the local...
A Note on Profile Likelihood, Least Favourable Families and Kullback-Leibler Distance. (1998)
Tibshirani, Robert, Wasserman, Larry
Several methods exist for reducing higher dimensional problems to a single parameter. These include the profile likelihood, least favourable families and methods based on the Kullback-Leibler...
Efron, Bradley, Tibshirani, Robert
In the problem of regions, we wish to know which one of a discrete set of possibilities applies to a continuous parameter vector. This problem arises in the following way: we compute a descriptive...
Classification by pairwise coupling (1998)
Hastie, Trevor, Tibshirani, Robert
We discuss a strategy for polychotomous classification that involves coupling the estimating class probabilities for each pair of classes, and estimates together. The coupling model is similar to the...
Additive Logistic Regression: a Statistical View of Boosting (1998)
Jerome Friedman, Trevor Hastie, Robert Tibshirani
Boosting (Freund & Schapire 1996, Schapire & Singer 1998) is one of the most important recent developments in classification methodology. The performance of many classification algorithms can...
Additive Logistic Regression: a Statistical View of Boosting (1998)
Jerome Friedman, Trevor Hastie, Robert Tibshirani
Boosting (Freund & Schapire 1996, Schapire & Singer 1998) is one of the most important recent developments in classification methodology. The performance of many classification algorithms...
Bradley Efron, Robert Tibshirani
In the problem of regions we wish to know which one of a discrete set of possibilities applies to a continuous parameter vector. This problem arises in the following way: we compute a descriptive...
Trevor Hastie, Robert Tibshirani
We propose general procedures for posterior sampling from additive and generalized additive models, with applications to non-parametric, semi-parametric and mixed models. One chooses a linear...
Classification by Pairwise Coupling (1998)
Trevor Hastie, Robert Tibshirani
We discuss a strategy for polychotomous classification that involves estimating class probabilities for each pair of classes, and then coupling the estimates together. The coupling model is similar...
And Robert, Trevor Hastie, Robert Tibshirani
We propose general procedures for posterior sampling from additive and generalized additive models, with applications to non-parametric, semi-parametric and mixed models. One chooses a linear...
Additive Logistic Regression: a Statistical View of Boosting (1998)
Jerome Friedman, Trevor Hastie, Robert Tibshirani
Boosting (Freund & Schapire 1995) is one of the most important recent developments in classification methodology. Boosting works by sequentially applying a classification algorithm to reweighted...
The lasso method for variable selection in the cox model (1997)
I propose a new method for variable selection and shrinkage in Cox’s proportional hazards model. My proposal minimizes the log partial likelihood subject to the sum of the absolute values of the...
Model Search and Inference by Bootstrap "Bumping" (1997)
Robert Tibshirani, Keith Knight
We propose a bootstrap-based method for searching through a space of models. The technique is well suited to complex, adaptively fitted models: it provides a convenient method for finding better...
A proposal for variable selection in the Cox model (1997)
We propose a new method for variable selection and estimation in Cox's proportional hazards model. Our proposal minimizes the log partial likelihood subject to the sum of the absolute values of...
Using specially designed exponential families for density estimation (1996)
Efron, Bradley, Tibshirani, Robert
We wish to estimate the probability density $g(y)$ that produced an observed random sample of vectors $y_1, y_2, \dots, y_n$. Estimates of $g(y)$ are traditionally constructed in two quite different...
Discriminant Analysis by Gaussian Mixtures (1996)
Trevor Hastie, Robert Tibshirani
Fisher-Rao linear discriminant analysis (LDA) is a valuable tool for multigroup classification. LDA is equivalent to maximum likelihood classification assuming Gaussian distributions for each class....
Computer-Aided Diagnosis of Mammographic Masses (1996)
Trevor Hastie, Debra Ikeda, Robert Tibshirani
We propose a statistical method for finding masses on mammograms. The technique is based on fitting broken line regressions to local intensity plots of the images. The method is illustrated on a...
The Out-of-Bootstrap Method for Model Averaging and Selection (1996)
We propose a bootstrap-based method for model averaging and selection that focuses on training points that are left out of individual bootstrap samples. This information can be used to estimate...
Computer-Aided Diagnosis of Mammographic Masses (1996)
Trevor Hastie, Debra Ikeda, Robert Tibshirani
We propose a statistical method for finding masses on mammograms. The technique is based on fitting broken line regressions to local intensity plots of the images. The method is illustrated on a...
Discriminant Analysis by Gaussian Mixtures (1996)
Trevor Hastie, Robert Tibshirani
Fisher-Rao linear discriminant analysis (LDA) is a valuable tool for multigroup classification. LDA is equivalent to maximum likelihood classification assuming Gaussian distributions for each class....
Two Applications of the Bootstrap (1996)
Introduction ffl The bootstrap (Efron (1979)) was introduced as a general method for assessing the statistical accuracy of an estimator. See Efron & Tibshirani (1993), Diaconis & Efron...
Bias, Variance and Prediction Error for Classification Rules (1996)
We study the notions of bias and variance for classification rules. Following Efron (1978) and Breiman (1996) we develop a decomposition of prediction error into its natural components. Then we...
Classification by Pairwise Coupling (1996)
Trevor Hastie, Robert Tibshirani
We discuss a strategy for polychotomous classification that involves estimating class probabilities for each pair of classes, and then coupling the estimates together. The coupling model is similar...
Bias, Variance and Prediction Error for Classification Rules (1996)
We study the notions of bias and variance for classification rules. Following Efron (1978) we develop a decomposition of prediction error into its natural components. Then we derive bootstrap...
Classification by Pairwise Coupling (1996)
Trevor Hastie, Robert Tibshirani
We discuss a strategy for polychotomous classification that involves estimating class probabilities for each pair of classes, and then coupling the estimates together. The coupling model is similar...
"Coaching" variables for regression and classification (1995)
Robert Tibshirani, Geoffrey Hinton
In a regression or classification setting where we wish to predict Y from x1 ; x2 ; . . . xp , we suppose that an additional set of "coaching" variables z1 ; z2 ; . . . zm are available in...
Penalized Discriminant Analysis (1995)
Trevor Hastie, Andreas Buja, Robert Tibshirani
Fisher's linear discriminant analysis (LDA) is a popular data-analytic tool for studying the relationship between a set of predictors and a categorical response. In this paper we describe a...
Cross-Validation and the Bootstrap: Estimating the Error Rate of a Prediction Rule (1995)
Bradley Efron, Robert Tibshirani
A training set of data has been used to construct a rule for predicting future responses. What is the error rate of this rule? The traditional answer to this question is given by cross-validation....
Flexible Discriminant and Mixture Models (1995)
Trevor Hastie, Robert Tibshirani, Andreas Buja
Introduction In the generic classification or discrimination problem, the outcome of interest G falls into J unordered classes, which for convenience we denote by the set f1; 2; \Delta \Delta \Delta...
Generalized Additive Models (1995)
Trevor Hastie, Robert Tibshirani
This article describes flexible statistical methods that may be used to identify and characterize nonlinear regression effects. These methods are called "generalized additive models". For...
Penalized Discriminant Analysis (1995)
Trevor Hastie, Andreas Buja, Robert Tibshirani
Fisher's linear discriminant analysis (LDA) is a popular data-analytic tool for studying the relationship between a set of predictors and a categorical response. In this paper we describe a...
Flexible Discriminant Analysis by Optimal Scoring (1994)
Trevor Hastie, Robert Tibshirani, Andreas Buja
Fisher's linear discriminant analysis is a valuable tool for multigroup classi cation. With a large number of predictors, one can nd a reduced number of discriminant coordinate functions that...
Discriminant Adaptive Nearest Neighbor Classification (1994)
Trevor Hastie, Robert Tibshirani
Nearest neighbor classification expects the class conditional probabilities to be locally constant, and suffers from bias in high dimensions. We propose a locally adaptive form of nearest neighbor...
Regression Shrinkage and Selection Via the Lasso (1994)
We propose a new method for estimation in linear models. The "lasso" minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a...
Discriminant Adaptive Nearest Neighbor Classification (1994)
Trevor Hastie, Robert Tibshirani
We propose an adaptive nearest neighbor rule that uses local discriminant information to estimate an effective metric for classification. We also propose a method for global dimension reduction, that...
A Comparison of Some Error Estimates for Neural Network Models (1994)
this paper we focus on the problem of estimation of the standard error of the predicted values
Handwritten Digit Recognition via Deformable Prototypes (1994)
Trevor Hastie, Robert Tibshirani
We present a new method for classifying handwritten characters, in particular digits. In our approach each character in the alphabet is represented by a prototype, in particular a piecewise-linear...
A Comparison of Some Error Estimates for Neural Network Models (1994)
this paper we focus on the problem of estimation of the standard error of the predicted values y( `; x i ). A reference for these techniques is Efron and Tibshirani (1993), especially chapter 21. One...
Implications of Measurement Error in Exposure for the Sample Sizes of Case-Control Studies (1994)
McKeown-Eyssen, Gail E., Tibshirani, Robert
In this paper, recent results describing the effects of measurement error on estimation of the association between an exposure and a disease are applied to sample size calculation in case-control...
An introduction to the bootstrap / Bradley Efron and Robert J. Tibshirani (1993)
Efron, Bradley, Tibshirani, Robert
Incluye bibliografía e índice
Flexible Discriminant Analysis by Optimal Scoring (1993)
Trevor Hastie, Robert Tibshirani, Andreas Buja
Fisher's linear discriminant analysis is a valuable tool for multigroup classication. With a large number of predictors, one can nd a reduced number of discriminant coordinate functions that are...
Flexible Discriminant Analysis by Optimal Scoring (1993)
Trevor Hastie, Robert Tibshirani, A. Buja
Fisher's linear discriminant analysis is a valuable tool for multigroup classification. With a large number of predictors, one can find a reduced number of discriminant coordinate functions that...
Combining Estimates in Regression and Classification (1993)
Michael LeBlanc, Robert Tibshirani
We consider the problem of how to combine a collection of general regression fit vectors in order to obtain a better predictive model. The individual fits may be from subset linear regression, ridge...
Principal Curves Revisited (1992)
A principal curve (Hastie and Stuetzle, 1989) is a smooth curve passing through the "middle" of a distribution or data cloud, and is a generalization of linear principal components. We give...
Generalized Additive Models (1990)
Trevor Hastie, Robert Tibshirani
This article describes flexible statistical methods that may be used to identify and characterize the effect of potential prognostic factors on an outcome variable. These methods are called...
Linear smoothers and additive models (1989)
Andreas Buja, Trevor Hastie, Robert Tibshirani, Andreas Buja, Robert Tibshirani
We study linear smoothers and their use in building non-parametric regression models. In part Qfthis paper we examine certain aspects of linear smoothers for scatterplots; examples of these are the...
Variance stabilization and the bootstrap (1988)
We investigate the use of a variance stabilizing transformation for the computation of a bootstrap t confidence interval. The transformation is estimated in an ‘automatic’ manner through an...
Local likelihood estimation--[microform] /--by Robert John Tibshirani. (1985)
University Microfilms order no. 85-06262.
Local likelihood estimation /--by Robert John Tibshirani. (1984)
Thesis (Ph. D.)--Stanford University, 1984.
'Gene shaving' as a method for identifying distinct sets of genes with similar expression patterns
Hastie, Trevor, Tibshirani, Robert, Eisen, Michael B, Alizadeh, Ash, Levy, Ronald, Staudt, Louis, ...
Supervised harvesting of expression trees
Hastie, Trevor, Tibshirani, Robert, Botstein, David, Brown, Patrick
Significance analysis of microarrays applied to the ionizing radiation response
Tusher, Virginia Goss, Tibshirani, Robert, Chu, Gilbert
Microarrays can measure the expression of thousands of genes to identify changes in expression between different biological states. Methods are needed to determine the significance of these changes...
Sørlie, Therese, Perou, Charles M., Tibshirani, Robert, Aas, Turid, Geisler, Stephanie, Johnsen, Hilde, ...
The purpose of this study was to classify breast carcinomas based on variations in gene expression patterns derived from cDNA microarrays and to correlate tumor characteristics to clinical outcome. A...
Diagnosis of multiple cancer types by shrunken centroids of gene expression
Tibshirani, Robert, Hastie, Trevor, Narasimhan, Balasubramanian, Chu, Gilbert
We have devised an approach to cancer class prediction from gene expression profiling, based on an enhancement of the simple nearest prototype (centroid) classifier. We shrink the prototypes and...
Transcriptional programs activated by exposure of human prostate cancer cells to androgen
DePrimo, Samuel E, Diehn, Maximilian, Nelson, Joel B, Reiter, Robert E, Matese, John, Fero, Mike, ...
DNA microarrays were used to examine the temporal program of gene expression following treatment of a human prostate cancer cell line with androgen. Significant changes in levels of transcripts of...
Pollack, Jonathan R., Sørlie, Therese, Perou, Charles M., Rees, Christian A., Jeffrey, Stefanie S., Lonning, Per E., ...
Genomic DNA copy number alterations are key genetic events in the development and progression of human cancers. Here we report a genome-wide microarray comparative genomic hybridization (array CGH)...
Repeated observation of breast tumor subtypes in independent gene expression data sets
Sørlie, Therese, Tibshirani, Robert, Parker, Joel, Hastie, Trevor, Marron, J. S., Nobel, Andrew, ...
Characteristic patterns of gene expression measured by DNA microarrays have been used to classify tumors into clinically relevant subgroups. In this study, we have refined the previously defined...
Statistical significance for genomewide studies
Storey, John D., Tibshirani, Robert
With the increase in genomewide experiments and the sequencing of multiple genomes, the analysis of large data sets has become commonplace in biology. It is often the case that thousands of features...
Gene Expression Patterns in Ovarian Carcinomas
Schaner, Marci E., Ross, Douglas T., Ciaravino, Giuseppe, Sørlie, Therese, Troyanskaya, Olga, Diehn, Maximilian, ...
We used DNA microarrays to characterize the global gene expression patterns in surface epithelial cancers of the ovary. We identified groups of genes that distinguished the clear cell subtype from...
Gene expression profiling identifies clinically relevant subtypes of prostate cancer
Lapointe, Jacques, Li, Chunde, Higgins, John P., Van De Rijn, Matt, Bair, Eric, Montgomery, Kelli, ...
Prostate cancer, a leading cause of cancer death, displays a broad range of clinical behavior from relatively indolent to aggressive metastatic disease. To explore potential molecular variation...
Semi-Supervised Methods to Predict Patient Survival from Gene Expression Data
Bair, Eric, Tibshirani, Robert
An important goal of DNA microarray research is to develop tools to diagnose cancer more accurately based on the genetic profile of a tumor. There are several existing techniques in the literature...
Toxicity from radiation therapy associated with abnormal transcriptional responses to DNA damage
Rieger, Kerri E., Hong, Wan-Jen, Tusher, Virginia Goss, Tang, Jean, Tibshirani, Robert, Chu, Gilbert
Toxicity from radiation therapy is a grave problem for cancer patients. We hypothesized that some cases of toxicity are associated with abnormal transcriptional responses to radiation. We used...
Early detection of breast cancer based on gene-expression patterns in peripheral blood cells
Sharma, Praveen, Sahni, Narinder S, Tibshirani, Robert, Skaane, Per, Urdal, Petter, Berghagen, Hege, ...
Gene Expression Profiling Predicts Survival in Conventional Renal Cell Carcinoma
Zhao, Hongjuan, Ljungberg, Börje, Grankvist, Kjell, Rasmuson, Torgny, Tibshirani, Robert, Brooks, James D
Molecular heterogeneity of renal cell carcinomas can be used to distinguish subgroups that correlated with long term survival.
Juric, Dejan, Sale, Sanja, Hromas, Robert A., Yu, Ron, Wang, Yan, Duran, George E., ...
Germ cell tumors (GCTs) of the testis are the predominant cancer among young men. We analyzed gene expression profiles of 50 GCTs of various subtypes, and we compared them with 443 other common...
Discovery and validation of breast cancer subtypes
Kapp, Amy V, Jeffrey, Stefanie S, Langerød, Anita, Børresen-Dale, Anne-Lise, Han, Wonshik, Noh, Dong-Young, ...
Bashyam, Murali D, Bair, Ryan, Kim, Young H, Wang, Pei, Hernandez-Boussard, Tina, Karikari, Collins A, ...
Pancreatic cancer, the fourth leading cause of cancer death in the United States, is frequently associated with the amplification and deletion of specific oncogenes and tumor-suppressor genes (TSGs),...
'Gene shaving' as a method for identifying distinct sets of genes with similar expression patterns
Hastie, Trevor, Tibshirani, Robert, Eisen, Michael B, Alizadeh, Ash, Levy, Ronald, Staudt, Louis, ...
Supervised harvesting of expression trees
Hastie, Trevor, Tibshirani, Robert, Botstein, David, Brown, Patrick
Significance analysis of microarrays applied to the ionizing radiation response
Tusher, Virginia Goss, Tibshirani, Robert, Chu, Gilbert
Microarrays can measure the expression of thousands of genes to identify changes in expression between different biological states. Methods are needed to determine the significance of these changes...
Sørlie, Therese, Perou, Charles M., Tibshirani, Robert, Aas, Turid, Geisler, Stephanie, Johnsen, Hilde, ...
The purpose of this study was to classify breast carcinomas based on variations in gene expression patterns derived from cDNA microarrays and to correlate tumor characteristics to clinical outcome. A...
Diagnosis of multiple cancer types by shrunken centroids of gene expression
Tibshirani, Robert, Hastie, Trevor, Narasimhan, Balasubramanian, Chu, Gilbert
We have devised an approach to cancer class prediction from gene expression profiling, based on an enhancement of the simple nearest prototype (centroid) classifier. We shrink the prototypes and...
Transcriptional programs activated by exposure of human prostate cancer cells to androgen
DePrimo, Samuel E, Diehn, Maximilian, Nelson, Joel B, Reiter, Robert E, Matese, John, Fero, Mike, ...
DNA microarrays were used to examine the temporal program of gene expression following treatment of a human prostate cancer cell line with androgen. Significant changes in levels of transcripts of...
Pollack, Jonathan R., Sørlie, Therese, Perou, Charles M., Rees, Christian A., Jeffrey, Stefanie S., Lonning, Per E., ...
Genomic DNA copy number alterations are key genetic events in the development and progression of human cancers. Here we report a genome-wide microarray comparative genomic hybridization (array CGH)...
Repeated observation of breast tumor subtypes in independent gene expression data sets
Sørlie, Therese, Tibshirani, Robert, Parker, Joel, Hastie, Trevor, Marron, J. S., Nobel, Andrew, ...
Characteristic patterns of gene expression measured by DNA microarrays have been used to classify tumors into clinically relevant subgroups. In this study, we have refined the previously defined...
Statistical significance for genomewide studies
Storey, John D., Tibshirani, Robert
With the increase in genomewide experiments and the sequencing of multiple genomes, the analysis of large data sets has become commonplace in biology. It is often the case that thousands of features...
Gene Expression Patterns in Ovarian Carcinomas
Schaner, Marci E., Ross, Douglas T., Ciaravino, Giuseppe, Sørlie, Therese, Troyanskaya, Olga, Diehn, Maximilian, ...
We used DNA microarrays to characterize the global gene expression patterns in surface epithelial cancers of the ovary. We identified groups of genes that distinguished the clear cell subtype from...
Gene expression profiling identifies clinically relevant subtypes of prostate cancer
Lapointe, Jacques, Li, Chunde, Higgins, John P., Van De Rijn, Matt, Bair, Eric, Montgomery, Kelli, ...
Prostate cancer, a leading cause of cancer death, displays a broad range of clinical behavior from relatively indolent to aggressive metastatic disease. To explore potential molecular variation...
Semi-Supervised Methods to Predict Patient Survival from Gene Expression Data
Bair, Eric, Tibshirani, Robert
An important goal of DNA microarray research is to develop tools to diagnose cancer more accurately based on the genetic profile of a tumor. There are several existing techniques in the literature...
Toxicity from radiation therapy associated with abnormal transcriptional responses to DNA damage
Rieger, Kerri E., Hong, Wan-Jen, Tusher, Virginia Goss, Tang, Jean, Tibshirani, Robert, Chu, Gilbert
Toxicity from radiation therapy is a grave problem for cancer patients. We hypothesized that some cases of toxicity are associated with abnormal transcriptional responses to radiation. We used...
Chang, Howard Y., Nuyten, Dimitry S. A., Sneddon, Julie B., Hastie, Trevor, Tibshirani, Robert, Sørlie, Therese, ...
Based on the hypothesis that features of the molecular program of normal wound healing might play an important role in cancer metastasis, we previously identified consistent features in the...
Early detection of breast cancer based on gene-expression patterns in peripheral blood cells
Sharma, Praveen, Sahni, Narinder S, Tibshirani, Robert, Skaane, Per, Urdal, Petter, Berghagen, Hege, ...
Gene Expression Profiling Predicts Survival in Conventional Renal Cell Carcinoma
Zhao, Hongjuan, Ljungberg, Börje, Grankvist, Kjell, Rasmuson, Torgny, Tibshirani, Robert, Brooks, James D
Molecular heterogeneity of renal cell carcinomas can be used to distinguish subgroups that correlated with long term survival.
Juric, Dejan, Sale, Sanja, Hromas, Robert A., Yu, Ron, Wang, Yan, Duran, George E., ...
Germ cell tumors (GCTs) of the testis are the predominant cancer among young men. We analyzed gene expression profiles of 50 GCTs of various subtypes, and we compared them with 443 other common...
Bashyam, Murali D, Bair, Ryan, Kim, Young H, Wang, Pei, Hernandez-Boussard, Tina, Karikari, Collins A, ...
Pancreatic cancer, the fourth leading cause of cancer death in the United States, is frequently associated with the amplification and deletion of specific oncogenes and tumor-suppressor genes (TSGs),...
Discovery and validation of breast cancer subtypes
Kapp, Amy V, Jeffrey, Stefanie S, Langerød, Anita, Børresen-Dale, Anne-Lise, Han, Wonshik, Noh, Dong-Young, ...
Discovery and validation of breast cancer subtypes
Kapp, Amy V, Jeffrey, Stefanie S, Langerød, Anita, Børresen-Dale, Anne-Lise, Han, Wonshik, Noh, Dong-Young, ...
Following the publication of our recent article (Kapp et al., BMC Genomics 2006, 7:231), we (the authors) regrettably found several errors in the published Table 5. This correction article not only...
Sparsity and smoothness via the fused lasso
Robert Tibshirani, Michael Saunders, Saharon Rosset, Ji Zhu, Keith Knight
The lasso penalizes a least squares regression by the sum of the absolute values ("L"1-norm) of the coefficients. The form of this penalty encourages sparse solutions (with many coefficients equal to...
IRF9 and STAT1 are required for IgG autoantibody production and B cell expression of TLR7 in mice
Thibault, Donna L., Chu, Alvina D., Graham, Kareem L., Balboni, Imelda, Lee, Lowen Y., Kohlmoos, Cassidy, ...
A hallmark of SLE is the production of high-titer, high-affinity, isotype-switched IgG autoantibodies directed against nucleic acid–associated antigens. Several studies have established a role for...
Prediction by Supervised Principal Components
Bair, Eric, Hastie, Trevor, Paul, Debashis, Tibshirani, Robert
Pre-validation and inference in microarrays
In microarray studies, an important problem is to compare a predictor of disease outcome derived from gene expression levels to standard clinical predictors. Comparing them on the same dataset that...
Chan, Steven M., Weng, Andrew P., Tibshirani, Robert, Aster, Jon C.
Constitutive Notch activation is required for the proliferation of a subgroup of T-cell acute lymphoblastic leukemia (T-ALL). Downstream pathways that transmit pro-oncogenic signals are not well...
Zhao, Hongjuan, Zongming Ma,, Tibshirani, Robert, Higgins, John P. T., Ljungberg, Börje, Brooks, James D.
Clear cell renal cell carcinoma (ccRCC) is the most common malignancy of the adult kidney and displays heterogeneity in clinical outcomes. Through comprehensive gene expression profiling, we have...
Covariance-regularized regression and classification for high dimensional problems
Daniela M. Witten, Robert Tibshirani
We propose covariance-regularized regression, a family of methods for prediction in high dimensional settings that uses a shrunken estimate of the inverse covariance matrix of the features to achieve...
Univariate Shrinkage in the Cox Model for High Dimensional Data
We propose a method for prediction in Cox's proportional model, when the number of features (regressors), p, exceeds the number of observations, n. The method assumes that the features are...
Extensions of Sparse Canonical Correlation Analysis with Applications to Genomic Data
Daniela Witten, Robert Tibshirani
In recent work, several authors have introduced methods for sparse canonical correlation analysis (sparse CCA). Suppose that two sets of measurements are available on the same set of observations....
Disease signatures are robust across tissues and experiments
Dudley, Joel T, Tibshirani, Robert, Deshpande, Tarangini, Butte, Atul J
Meta-analyses combining gene expression microarray experiments offer new insights into the molecular pathophysiology of disease not evident from individual experiments. Although the established...
Boolean implication networks derived from large scale, whole genome microarray datasets
Sahoo, Debashis, Dill, David L, Gentles, Andrew J, Tibshirani, Robert, Plevritis, Sylvia K
A method for analysis of microarray data is presented that extracts statistically significant Boolean implication relationships between pairs of genes.