The Memory logP Model of Local (2008)
Kirk W. Cameron, Member Ieee, Xian-he Sun, Senior Member
Abstract—Data movement across a memory hierarchy can severely impact application execution time. For example, on the fast interconnect of the Origin 2000 three- and four-fold increases in...
CPU MISER: A Performance-Directed, Run-Time System for Power-Aware Clusters (2008)
Rong Ge, Xizhou Feng, Wuchun Feng, Kirk W. Cameron
Performance and power are critical design constraints in today’s high-end computing systems. Reducing power consumption without impacting system performance is a challenge for the HPC community. We...
Synthesizing Parallel Programming Models for Asymmetric Multi-core Systems (2008)
Dimitrios S. Nikolopoulos, Kirk W. Cameron
Asymmetric multi-core processors integrating conventional with customized accelerator cores, have exhibited the potential to provide unprecedented performance for dataintensive applications. Although...
Reservation Station Architecture for Mutable Functional Unit Usage in Superscalar Processors (2008)
Yan Solihin, Kirk W. Cameron, Yong Luo, Dominique Lavenier, Maya Gokhale
One major bottleneck of a superscalar processor is mismatch of instruction stream mix with functional unit configuration. Depending on the type and number of functional units, the performance loss...
Xian-he Sun, Kirk W. Cameron, Yong Luo, Dongmei He
Memory latency is a substantial contributor to single processor performance loss. Latency hiding techniques such as out of order and speculative execution, outstanding loads to memory, and increases...
Memory-aware Communication – an Experimental Study with MPI * (2008)
Surendra Byna, Kirk W. Cameron, Xian-he Sun
Assuming network transfer is the dominant factor of communication, current communication models estimate only network related delays and are inadequate to address other performance factors such as...
Filip Blagojevic, Xizhou Feng, Kirk W. Cameron, Dimitrios S. Nikolopoulos
Abstract. Heterogeneous multi-core processors invest the most significant portion of their transistor budget in customized “accelerator ” cores, while using a small number of conventional low-end...
Filip Blagojevic, Xizhou Feng, Kirk W. Cameron, Dimitrios S. Nikolopoulos
Abstract. Heterogeneous multi-core processors invest the most significant portion of their transistor budget in customized “accelerator ” cores, while using a small number of conventional low-end...
Dynamically Mutable Functional Unit in Superscalar Processors (2007)
Yan Solihin, Kirk W. Cameron, Yong Luo, Dominique Lavenier, Maya Gokhale
One major bottleneck of a superscalar processor is the mismatch of instruction stream mix with functional unit configuration. Depending on the type and number of functional units, the performance...
Yan Solihin, Kirk W. Cameron, Yong Luo, Dominique Lavenier
2 Florida Institute of Technology,
1 Reservation Station Architecture for Mutable Functional Unit Usage in (2007)
Superscalar Processors, Yan Solihin, Kirk W. Cameron, Yong Luo, Dominique Lavenier, Maya Gokhale
One major bottleneck of a superscalar processor is mismatch of instruction stream mix with functional unit configuration. Depending on the type and number of functional units, the performance loss...
Publications Journal Papers: (2006)
Rong Ge, Kirk W. Cameron, Rong Ge, Xian-he Sun, Kirk W. Cameron, ...
PBPI: a High Performance Implementation of Bayesian Phylogenetic Inference (2006)
Xizhou Feng, Kirk W. Cameron, Duncan A. Buell
This paper describes the implementation and performance of PBPI, a parallel implementation of Bayesian phylogenetic inference method for DNA sequence data. By combining the Markov Chain Monte Carlo...
Predicting and Evaluating Distributed Communication Performance (2004)
Abstract–Application of hardware-parameterized models to distributed systems can result in omission of key bottlenecks such as the full cost of inter- and intra-node communication in a cluster of...
A statistical–empirical hybrid approach to hierarchical memory analysis (2000)
Abstract. Ahybrid approach that utilizes both statistical techniques and empirical methods seeks to provide more information about the performance of an application. In this paper, we present a...
Instruction-levle microprocessor modeling of scientific applications (1999)
Kirk W. Cameron, Yong Luo, James Scharzmeier
Abstract. Superscalar microprocessor efficiency is generally not as high as anticipated. In fact, sustained utilization below thirty percent of peak is not uncommon, even for fully optimized,...
A Factorial Performance Evaluation for Hierarchical Memory Systems (1999)
Xian-He Sun, Dongmei He, Kirk W. Cameron, Yong Luo
In this study, we introduce an evaluation methodology for advanced memory systems. This methodology is based on statistical factorial analysis. It is two fold: it first determines the impact of...
Boosting the Speedup of Future Processor Architectures by Using Mutable Functional Units (1999)
Yan Solihin, Kirk W. Cameron, Yong Luo, Dominique Lavenier, Maya Gokhale
One major bottleneck of a superscalar processor is the mismatch of instruction stream mix with functional unit configuration. The resulting "unavailable functional unit" stalls can be a...