Versatile Tiled-Processor Architectures: The Raw Approach (2008)
Rodric M. Rabbah, Ian Bratt, Krste Asanovic, Anant Agarwal
Advances in VLSI technology have spurred an increasing interest within the computer architecture community to build a new kind of “all-purpose ” processor that is able to run a broad class of...
John L. Hennessy, Krste Asanovic, Robert P. Colwell, Thomas M. Conte, ...
New York • Oxford • Paris • San Diego
PARALLEL NEURAL NETWORK TRAINING ON MULTI-SPERT (2008)
Philipp Farber, Krste Asanovic
Multi-Spert is a scalable parallel system built from multiple Spert-II nodes which we have constructed to speed error backpropagation neural network training for speech recognition research. We...
Krste Asanovic, Ras Bodik, James Demmel, Tony Keaveny, Kurt Keutzer, John D. Kubiatowicz, ...
Copyright © 2008, by the author(s). All rights reserved. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that...
Parallel Neural Network Training On Multi-Spert (2007)
Philipp Färber, Krste Asanovic
this paper we present the parallelization and resulting performance of backprop on Multi-Spert. We concentrate on the two components which dominate the runtime of our training experiments, the...
A Fast Kohonen Net Implementation for Spert-II (2007)
. We present an implementation of Kohonen Self-Organizing Feature Maps for the Spert-II vector microprocessor system. The implementation supports arbitrary neural map topologies and arbitrary...
The PHiPAC v1.0 Matrix-Multiply Distribution. (2007)
Jeff Bilmes Krste, Jeff Bilmes, Krste Asanovi C, Chee-whye Chin, Jim Demmel, Krste Asanovic
Modern microprocessors can achieve high performance on linear algebra kernels but this currently requires extensive machine-specific hand tuning. We have developed a methodology whereby near-peak...
Scaling Processors to 1 Billion Transistors and Beyond: IRAM (2007)
Stylianos Perissakis, Christoforos Kozyrakis, Tom Anderson, Krste Asanovic, Neal Cardwell, Richard Fromm, ...
this paper we introduce an alternative way of using the huge amount of real estate available on such a chip: integrating the processor and the main memory on the same die. We call this architecture...
1 A Double-Pulsed Set-Conditional-Reset Flip-Flop (2007)
Abstract---A new flip-flop design using a double-pulsed static latch is presented. The flip-flop has only a single stage of logic in the critical path and as a result is up to three times faster than...
A Fast Kohonen Net Implementation for Spert-II (2007)
We present an implementation of Kohonen Self-Organizing Feature Maps for the Spert-II vector microprocessor system. The implementation supports arbitrary neural map topologies and arbitrary...
Low-Power Single-Precision IEEE Floating-Point (2007)
Submitted To The, Sheetal A. Jain, Krste Asanovic, A. Jain
Floating point adders are area and power intensive, but essential in high performance systems. The Software-Controlled Architectures and Low Energy (SCALE) project requires a low-power...
Scale control processor test-chip (2007)
Chrstopher Batten, Krste Asanovic, Christopher Batten, Ronny Krashinsky, Ronny Krashinsky, Krste Asanović
We are investigating vector-thread architectures which provide competitive performance and efficiency across a broad class of application domains [1, 4]. Vector-thread architectures unify data-level,...
Krste Asanovic, Ras Bodik, Bryan Christopher Catanzaro, Joseph James Gebis, Kurt Keutzer, David A. Patterson, ...
Copyright © 2006, by the author(s).
RingScalar: A Complexity-Effective Out-of-Order Superscalar Microarchitecture (2006)
Jessica H. Tseng, Krste Asanovic, Jessica H. Tseng, Krste Asanović
RingScalar is a complexity-effective microarchitecture for out-of-order superscalar processors, that reduces the area, latency, and power of all major structures in the instruction flow. The design...
RAMP: Research Accelerator for Multiple Processors (2006)
John Wawrzynek, Mark Oskin, Christoforos Kozyrakis, Derek Chiou, David A. Patterson, Shih-lien Lu, ...
Copyright © 2006, by the author(s). All rights reserved. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that...
A speculative control scheme for an energy-efficient banked register file (2005)
Jessica H. Tseng, Student Member, Krste Asanovic
Abstract—Multiported register files are critical components of modern superscalar and simultaneously multithreaded (SMT) processors, but conventional designs consume considerable die area and power...
Victim Migration: Dynamically Adapting Between Private and Shared CMP Caches (2005)
Krste Asanovic, Michael Zhang, Michael Zhang, Krste Asanović
Future CMPs will have more cores and greater onchip cache capacity. The on-chip cache can either be divided into separate private L2 caches for each core, or treated as a large shared L2 cache....
Fast Fourier Transform on a 3D FPGA by (2005)
Elizabeth Basha, Krste Asanovic, C. Smith
Fast Fourier Transforms perform a vital role in many applications from astronomy to cellphones. The complexity of these algorithms results from the many computational steps, including...
Cache Refill/Access Decoupling for Vector Machines (2004)
Christopher Batten, Ronny Krashinsky, Steve Gerding, Krste Asanovic
Vector processors often use a cache to exploit temporal locality and reduce memory bandwidth demands, but then require expensive logic to track large numbers of outstanding cache misses to sustain...
Versatility and versabench: A new metric and a benchmark suite for flexible architectures (2004)
Rodric M. Rabbah, Ian Bratt, Krste Asanovic, Anant Agarwal
For the last several decades, computer architecture research has largely benefited from, and continues to be driven by ad-hoc benchmarking. Often the benchmarks are selected to represent workloads...
Sean Lie, Krste Asanovic, Arthur C. Smith
Hardware Support for Unbounded Transactional Memory by
Certified by___________________________________________________________ (2003)
Elina Kamenetskaya, Krste Asanovic, Arthur C. Smith, Elina Kamenetskaya
for a Handheld Device by
Way memoization to reduce fetch energy in instruction caches (2001)
Albert Ma, Michael Zhang, Krste Asanovic
Instruction caches consume a large fraction of the total power in modern low-power microprocessors. In particular, set-associative caches, which are preferred because of lower miss rates, require...
The Span Cache: Software Controlled Tag Checks and Cache Line Size (2001)
Emmett Witchel, Krste Asanovic
The span cache is a hardware-software design for a new kind of energy-efficient microprocessor data cache which has two key features. The first is direct addressing which allows software to access...
Direct addressed caches for reduced power consumption (2001)
Emmett Witchel, Sam Larsen, C. Scott Ananian, Krste Asanovic
A direct addressed cache is a hardware-software design for an energy-efficient microprocessor data cache. Direct addressing allows software to access cache data without a hardware cache tag check....
The Span Cache: Software Controlled Tag Checks and Cache Line Size (2001)
Emmett Witchel, Krste Asanovic
The span cache is a hardware-software design for a new kind of energy-efficient microprocessor data cache which has two key features. The first is direct addressing which allows software to access...
Exposing datapath elements to reduce microprocessor energy consumption (2001)
Krste Asanovic, Arthur C. Smith, Mark Jerome Hampton, Mark Jerome Hampton
at the
Energy-Exposed Instruction Set Architectures (2000)
Introduction Power consumption is emerging as a key factor limiting computational performance in both mobile and tethered systems. Although there has been significant progress in low-power circuit...
SyCHOSys: Compiled Energy-Performance Cycle Simulation (2000)
Ronny Krashinsky, Seongmoo Heo, Michael Zhang, Krste Asanovic
SyCHOSys (Synchronous Circuit Hardware Orchestration System) generates high-speed energy-performancecycle simulators by compiling a processor description into efficient C++ code. This framework can...
Krste Asanovic, An Algol Machine, Robert Barton, A Stack Machine
• Computer Science at crossroads from sequential to parallel computing • Computer Architecture>> ISAs and RTL – CS152 is about interaction of hardware and software, and design of...
Krste Asanovic, Krste Asanovic, Krste Asanovic
Vector Microprocessors by Krste Asanovic Doctor of Philosophy in Computer Science University of California, Berkeley Professor John Wawrzynek, Chair Most previous research into vector architectures...
The PHiPAC v1.0 Matrix-Multiply Distribution. (1998)
Jeff Bilmes, Krste Asanovic, Chee-Whye Chin, Jim Demmel
Modern microprocessors can achieve high performance on linear algebra kernels but this currently requires extensive machine-specific hand tuning. We have developed a methodology whereby near-peak...
Intelligent RAM (IRAM): The industrial setting, applications, and architectures (1997)
David Patterson, Krste Asanovic, Aaron Brown, Richard Fromm, Jason Golbus, Benjamin Gribstad, ...
The goal of Intelligent RAM (IRAM) is to design a cost-effective computer by designing a processor in a memory fabrication process, instead of in a conventional logic fabrication process, and include...
Using Phipac To Speed Error Back-Propagation Learning (1997)
Jeff Bilmes, Krste Asanovic, Chee-whye Chin, Jim Demmel
We introduce PHiPAC, a coding methodology for developing portable high-performance numerical libraries in ANSI C. Using this methodology, we have developed code for optimized matrix multiply...
Jeff Bilmes, Krste Asanovic, Chee-Whye Chin, Jim Demmel
Modern microprocessors can achieve high performance on linear algebra kernels but this currently requires extensive machine-specific hand tuning. Wehave developed a methodology whereby near-peak...
Scalable Processors in the Billion-Transistor Era: IRAM (1997)
Christoforos E. Kozyrakis, Stylianos Perissakis, David Patterson, Thomas Anderson, Krste Asanovic, Neal Cardwell, ...
ther architecture alternatives, like wide superscalar and VLIW (very long instruction word), suffer from drawbacks---implementation complexity, low utilization of resources, and immature compiler...
Scaling Processors to 1 Billion Transistors and Beyond: IRAM (1997)
Stylianos Perissakis, Christoforos E. Kozyrakis, Thomas Anderson, Krste Asanovic, Neal Cardwell, Richard Fromm, ...
Conventional architectures have been developed with a transistor budget of a few hundred thousand and have evolved to designs of about 10 million transistors, achieving impressive performance....
T0: A Single-Chip Vector Microprocessor with Reconfigurable Pipelines (1996)
Krste Asanovic, James Beck, Bertrand Irissou, John Wawrzynek
A single-chip fixed-point vector microprocessor is described. The chip contains a MIPS-II RISC core with a 1 KB instruction cache, dual eight-way parallel vector arithmetic pipelines, a 128-bit...
Spert-II: A Vector Microprocessor System (1996)
John Wawrzynek, Krste Asanovic, Brian Kingsbury, David Johnson, James Beck, Nelson Morgan
this article. Primary support for our work came from ONR URI Grant N00014-92-J-1617, ARPA Contract N0001493-C0249, NSF Grant MIP-9311980, and NSF PYI AwardMIP-8958568NSF.Additional support was...
Jeff Bilmes, Krste Asanovic, Jim Demmel, Dominic Lam, Chee-Whye Chin
BLAS3 operations have great potential for aggressive optimization. Unfortunately, they usually need to be hand-coded for a speci#c machine and compiler to achieve near-peak performance. Wehave...
Jeff Bilmes, Krste Asanovic, Chee-Whye Chin, Jim Demmel
Modern microprocessors can achieve high performance on linear algebra kernels but this currently requires extensive machine-specific hand tuning. We have developed a methodology whereby near-peak...
A supercomputer for neural computation (1994)
Krste Asanovic, James Beck, Jerome Feldman, Nelson Morgan, John Wawrzynek
Abstract | The requirement to train large neural networks quickly has prompted the design of a new massively parallel supercomputer using custom VLSI. This design features 128 processing nodes,...
A Supercomputer for Neural Computation (1994)
Krste Asanovic, James Beck, Jerome Feldman, Nelson Morgan, John Wawrzynek
The requirement to train large neural networks quickly has prompted the design of a new massively parallel supercomputer using custom VLSI. This design features 128 processing nodes, communicating...
CNS-1 Architecture Specification - A Connectionist Network Supercomputer (1993)
Krste Asanovic, James Beck, Tim Callahan, Jerry Feldman, Brian Kingsbury, ...
This report proposes a massively parallel computer, the Connectionist Network Supercomputer(CNS-1), which leverages off these fields. By targeting the computer to connectionist networks and related...
Designing a Connectionist Network Supercomputer (1993)
Krste Asanovic, James Beck, Jerry Feldman, Nelson Morgan, John Wawrzynek
This paper describes an effort at UC Berkeley and the International Computer Science Institute to develop a super-computer for artificial neural network applications. Our perspective has been...
The design of a neuro-microprocessor (1993)
John Wawrzynek, Krste Asanovic, Nelson Morgan, Senior Member
Abstract- This paper presents the architecture of a neuro-microprocessor. This processor was designed using the results of careful analysis of our set of applications and extensive simulation of...
SPERT: A VLIW/SIMD Microprocessor for Artificial Neural Network Computations (1992)
Krste Asanovic, James Beck, Phil Kohn, Nelson Morgan, John Wawrzynek
SPERT (Synthetic PERceptron Testbed) is a fully programmable single chip microprocessor designed for efficient execution of artificial neural network algorithms. The first implementation will be in a...
Development of a Connectionist Network Supercomputer (1992)
Krste Asanovic, James Beck, Jerry Feldman, Nelson Morgan, John Wawrzynek
This paper describes an effort at UC Berkeley and the International Computer Science Institute to develop a super-computer for artificial neural network applications. We describe our applications...
HiPNeT-1: A Highly Pipelined Architecture for Neural Network Training (1991)
Krste Asanovic, Nelson Morgan, John Wawrzynek
Current artificial neural network (ANN) algorithms require extensive computational resources. However, they exhibit massive fine-grained parallelism and require only moderate arithmetic precision....
The impact of reduced weight and output precision on the back-propagation training algorithm [Wer74, RHW86] is experimentally determined for a feed-forward multilayer perceptron. In contrast with...
Simulation of Reduced Precision Arithmetic for Digital Neural Networks Using the RAP Machine (1991)
Krste Asanovic, Nelson Morgan, John Wawrzynek
This paper describes some of our recent work in the development of computer architectures for efficient execution of artificial neural network algorithms. Our earlier system, the Ring Array Processor...
Intelligent RAM (IRAM): the Industrial Setting, Applications, and Architectures
David Patterson, Krste Asanovic, Aaron Brown, Richard Fromm, Jason Golbus, Benjamin Gribstad, ...
The goal of Intelligent RAM (IRAM) is to design a cost-effective computer by designing a processor in a memory fabrication process, instead of in a conventional logic fabrication process, and include...