Sally A. Mckee

Publication List Details

Period

1992 - 2009

Number

129

Co-Authors

Are Cycle Accurate Simulations a Waste of Time? (2009)

Vincent M. Weaver, Sally A. Mckee

Cycle-accurate simulation methods are necessarily slow. This slowness is only acceptable if the simulation results can be shown to have smaller error than other, faster, methods of generating the...

Improving Performance and Energy through (2009)

Major Bhadauria, Raymond Huang, Sally A. Mckee

Multithreaded programs can deliver higher throughput than single-threaded programs on the chip multipro-cessors that have become the industry standard. However, increasing numbers of threads may not...

A Characterization of the PARSEC Benchmark Suite for CMP Design (2009)

Major Bhadauria, Vince Weaver, Sally A. Mckee

The shared-memory, multi-threaded PARSEC benchmark suite is intended to represent emerging software workloads for future systems. It is specifically intended for use by both industry and academia as...

Going Native: Faster Architectural Simulation Fast-Forwarding (2009)

Peter K. Szwed, Daniel Marques, Sally A. Mckee

As system complexity grows, cycle-accurate simulation experiments become inordinately time consuming. Most approaches to accelerating architectural simulation model only portions of an application in...

Using Machine Learning to Explore Huge Parameter Spaces for High End Computing Applications: Tools and Examples (2009)

Karan Singh, Sally A. Mckee

With constantly increasing software and architectural complexities and machine scales, creating accurate performance models for applications with large parameter spaces becomes extremely challenging....

Accomodating Diversity in CMPs with Heterogeneous Frequencies (2009)

Major Bhadauria, Vince Weaver, Sally A. Mckee

1 INTRODUCTION With single core chips, CPU manufacturers detect under-performing processors at production time and sell them

Identifying Energy-Efficient Concurrency Levels Using Machine Learning (2009)

Matthew Curtis-maury, Karan Singh, Sally A. Mckee, Filip Blagojevic, Dimitrios S. Nikolopoulos, ...

Abstract — Multicore microprocessors have been largely motivated by the diminishing returns in performance and the increased power consumption of single-threaded ILP microprocessors. With the...

A Precisely Tunable Drowsy Cache Management Mechanism (2008)

Major Bhadauria, Sally A. Mckee, Karan Singh, Gary Tyson

Minimizing power consumption continues to grow as a critical design issue for many platforms, from embedded systems to CMPs to ultrascale parallel systems. As growing cache sizes consume larger...

Efficient Architectural Design Space Exploration via Predictive Modeling (2008)

Sally A. Mckee, Karan Singh, Rich Caruana, Martin Schulz

Efficiently exploring exponential-size architectural design spaces with many interacting parameters remains an open problem: the sheer number of experiments required renders detailed simulation...

Abstract Interactive Locality Optimization on NUMA Architectures (2008)

Tao Mu, Jie Tao, Martin Schulz, Sally A. Mckee

Optimizing the performance of shared-memory NUMA programs remains something of a black art, requiring that application writers possess deep understanding of their programs ’ behaviors. This...

Specializing Cache Structures for High Performance and Energy Conservation in Embedded Systems (2008)

Michael J. Geiger, Sally A. Mckee, Gary S. Tyson

Abstract. Increasingly tight energy design goals require processor architects to rethink the organizational structure of microarchitectural resources. We examine a new multilateral cache...

Data Cache Techniques to Save Power and Deliver High Performance in Embedded Systems (2008)

Major Bhadauria, Sally A. Mckee, Karan Singh, Gary S. Tyson

Abstract. Minimizing power consumption continues to grow as a critical design issue for many platforms, from embedded systems to CMPs to ultrascale parallel systems. As growing cache sizes consume...

Formal Hardware Specification Languages for Protocol Compliance Verification (2008)

Annette Bunker, Ganesh Gopalakrishnan, Sally A. Mckee

The advent of the system-on-chip and intellectual property hardware design paradigms makes protocol compliance verification increasingly important to the success of a project. One of the central...

Rethinking Processor Design: Parameter Correlations (2008)

Nana B. Sam, Sally A. Mckee

Abstract—Computer architects rely heavily on simulation to explore increasingly complex design spaces. Keeping simulations within tractable limits forces architects to evaluate only subsets of...

Identifying Energy-Efficient Concurrency Levels Using Machine Learning (2008)

Matthew Curtis-maury, Karan Singh, Sally A. Mckee, Filip Blagojevic, Dimitrios S. Nikolopoulos, ...

Abstract — Multicore microprocessors have been largely motivated by the diminishing returns in performance and the increased power consumption of single-threaded ILP microprocessors. With the...

Message Passing Interface (MPI) [32]. Very little standard- (2008)

Martin Schulz, Sally A. Mckee

Widespread adaptation of shared memory programming for High Performance Computing has been inhibited by a lack of standardization and the resulting portability problems between platforms and APIs. In...

The Impulse Memory Controller (2008)

Mike Parker, Student Member, Binu K. Mathew, Lambert Schaelicke, Ieee Computer Society, John B. Carter, ...

AbstractÐImpulse is a memory system architecture that adds an optional level of address indirection at the memory controller. Applications can use this level of indirection to remap their data...

Identifying Energy-Efficient Concurrency Levels Using Machine Learning (2008)

Matthew Curtis-maury, Karan Singh, Sally A. Mckee, Filip Blagojevic, Dimitrios S. Nikolopoulos, ...

Abstract — Multicore microprocessors have been largely motivated by the diminishing returns in performance and the increased power consumption of single-threaded ILP microprocessors. With the...

ABSTRACT Improving the Computational Intensity of Unstructured Mesh Applications (2008)

Brian S. White, Sally A. Mckee

Although unstructured mesh algorithms are a popular means of solving problems across a broad range of disciplines—from texture mapping to computational fluid dynamics—they are often dominated not...

Perceptron-Based Branch Prediction: Performance of Some Design Options (2008)

Engin Ipek, Sally A. Mckee, Martin Schulz, Shai Ben David

Exploiting the huge computing power of modern microprocessors requires fast, accurate branch predictors: as clock rates rise and pipeline lengths grow, so do branch misprediction penalties. These...

Leveraging High Performance Data Cache Techniques to Save Power in Embedded Systems (2008)

Major Bhadauria, Sally A. Mckee, Karan Singh, Gary S. Tyson

Abstract. Voltage scaling reduces leakage power for cache lines unlikely to be referenced soon. Partitioning reduces dynamic power via smaller, specialized structures. We combine approaches, adding a...

Professional Experience (2008)

Sally A. Mckee

Research Interests Processor and memory systems architecture, performance analysis and prediction techniques and tools, computer system modeling, high-performance computing, compilers, operating...

The Impulse Memory Controller (2008)

Mike Parker, Student Member, Binu K. Mathew, Lambert Schaelicke, Ieee Computer Society, John B. Carter, ...

AbstractÐImpulse is a memory system architecture that adds an optional level of address indirection at the memory controller. Applications can use this level of indirection to remap their data...

Smarter Memory = Better Performance: Improving Effective Bandwidth for Streams (2007)

Sally A. Mckee, Assaji Aluwihare, Robert H. Klenke, Trevor C. Landon, Trevor C. L, Christopher W. Oliver, ...

Processor speeds are increasing so much faster than memory speeds that within a decade processors may spend most of their time waiting for data. The problem is already acute for computations that...

Approved for the Major Department (2007)

Lambert Schaelicke, Lambert Schaelicke, Erik L. Brunvand, John B. Carter, Sally A. Mckee, Ulrich Brüning, ...

This dissertation has been read by each member of the following supervisory committee and by majority vote has been found to be satisfactory. Chair:

2 (2007)

Jaydeep Marathe, Frank Mueller, Tushar Mohan, Sally A. Mckee, Andy Yoo

In this paper, we present METRIC, an environment for determining memory inefficiencies by examining data traces. METRIC is designed to alter the performance behavior of applications that are mostly...

A Cost Framework for Evaluating Integrated Restructuring Optimizations (2007)

Bharat Ch, John B. Carter, Wilson C. Hsieh, Sally A. Mckee

Loop transformations and array restructuring optimizations usually improve performance by increasing the memory locality of applications, but not always. For instance, loop and array restructuring...

Impulse Adaptable Memory Controller system: whenever (2007)

Lixin Zhang, Sally A. Mckee, Wilson C. Hsieh, John B. Carter

Prefetching has long been used to mask the latency of memory loads. This paper presents results for an initial implementation of pointer-based prefetching within the Impulse adaptable memory...

A Cost Framework for Evaluating Integrated Restructuring Optimizations (2007)

Bharat Ch, John B. Carter, Wilson C. Hsieh, Sally A. Mckee

Loop transformations and array restructuring optimizations usually improve performance by increasing the memory locality of applications, but not always. For instance, loop and array restructuring...

A Cost Framework for Evaluating Integrated Restructuring Optimizations (2007)

Bharat Ch, John B. Carter, Wilson C. Hsieh, Sally A. Mckee

Loop transformations and array restructuring optimizations usually improve performance by increasing the memory locality of applications, but not always. For instance, loop and array restructuring...

2 (2007)

Frank Mueller, Tushar Mohan, Sally A. Mckee, Andy Yoo

Binary manipulation techniques are increasing in popularity. They support program transformations tailored toward certain program inputs, and these transformations have been shown to yield...

To Their Mothers (2007)

W. Miksad, Alan P. Batson, Sally A. Mckee, Jack W. Davidson, ...

they have inspired and supported me more than can be imagined and

Local Relaxed Consistency Schemes on Shared-Memory Clusters (2007)

Martin Schulz, Jie Tao, Sally A. Mckee

Shared Memory is an attractive and convenient programming abstraction, and Shared Memory Clusters are a straighforward and efficient way to provide it. Unfortunately, the overhead of enforcing...

Formal Hardware Specification Languages for (2007)

Protocol Compliance Verification, Annette Bunker, Ganesh Gopalakrishnan, Sally A. Mckee

State Machines [B oger et al. 2000] or Petri Nets [Baresi 2002]; and those that bootstrap a semantics via metamodeling [Clark et al. 2001]. Hussmann [2002] considers these approaches in more detail...

A Cost Model For Integrated Restructuring Optimizations (2007)

Bharat Chandramouli, Wilson C. Hsieh, John B. Carter, Sally A. Mckee

Compilers must make choices between different optimizations; in this paper we present an analytic cost model that can be used to compare several compile-time optimizations for memory-intensive,...

S. Central Campus Dr. (2007)

Room Meb Salt, Lixin Zhang, Zhen Fang, Mike Parker, Mike Parker, ...

ting User-Level Networks with SMT," Mike Parker, Al Davis, Wilson Hsieh, Fifth Workshop on Multithreaded Execution, Architecture and Compilation, December 2001 "The Impulse Memory...

A Cost Framework for Evaluating (2007)

Integrated Restructuring Optimizations, Bharat Ch, John B. Carter, Wilson C. Hsieh, Sally A. Mckee

Loop transformations and array restructuring optimizations usually improve performance by increasing the memory locality of applications, but not always. For instance, loop and array restructuring...

Metric: Memory tracing via dynamic binary rewriting to identify cache inefficiencies (2007)

Jaydeep Marathe, Frank Mueller, Tushar Mohan, Sally A. Mckee, Andy Yoo

With the diverging improvements in CPU speeds and memory access latencies, detecting and removing memory access bottlenecks becomes increasingly important. In this work we present METRIC, a software...

An approach to performance prediction for parallel applications (2005)

Engin Ipek, Martin Schulz, Sally A. Mckee

Abstract. Accurately modeling and predicting performance for largescale applications becomes increasingly difficult as system complexity scales dramatically. Analytic predictive models are useful,...

Beyond basic region caching: Specializing cache structures for high performance and energy conservation (2005)

Michael J. Geiger, Sally A. Mckee, Gary S. Tyson

Abstract. Increasingly tight energy design goals require processor architects to rethink the organizational structure of microarchitectural resources. In this paper, we examine a new multilateral...

Simsnap: Fast-forwarding via native execution and application-level checkpointing (2004)

Peter K. Szwed, Daniel Marques, Robert M. Buels, Sally A. Mckee, Martin Schulz

As systems become more complex, conducting cycleaccurate simulation experiments becomes more time consuming. Most approaches to accelerating simulations attempt to choose simulation points such that...

Simsnap: Fast-forwarding via native execution and application-level checkpointing (2004)

Peter K. Szwed, Daniel Marques, Robert M. Buels, Sally A. Mckee, Martin Schulz

As systems become more complex, conducting cycleaccurate simulation experiments becomes more time consuming. Most approaches to accelerating simulations attempt to choose simulation points such that...

Formal Hardware Specification Languages for Protocol Compliance Verification (2004)

Annette Bunker, Ganesh Gopalakrishnan, Sally A. Mckee, School Of Elect

The advent of the system-on-chip and intellectual property hardware design paradigms makes protocol compliance verification increasingly important to the success of a project. One of the central...

Restructuring computations for temporal data cache locality (2003)

Venkata K. Pingali, Sally A. Mckee, Wilson C. Hsieh

withcomplexdatastructures.Athelatencyofmemoryaccessesbecomeshigh relativetoprocessorcycletimes,applicationperformanceisincreasinglylimited...

Identifying and Exploiting Spatial Regularity in Data Memory References (2003)

Tushar Mohan, Sally A. Mckee, Frank Mueller, Andy Yoo, Martin Schulz

The growing processor/memory performance gap causes the performance of many codes to be limited by memory accesses. If known to exist in an application, strided memory accesses forming streams can be...

An MPEG-4 performance study for non-SIMD, general purpose architectures (2003)

Sally A. Mckee

MPEG-4 is an important international standard with wide applicability. This paper focuses on MPEG-4’s main profile, video, whose approach allows more efficiency in coding and more flexibility in...

Restructuring computations for temporal data cache locality (2003)

Venkata K. Pingali, Sally A. Mckee, Wilson C. Hsieh, John B. Carter

herein are those of the authors, and should not be interpreted as representing the official polices or endorsements, either express or implied, of NSF, DARPA, AFRL, or the US Government. This...

Interactive Locality Optimization on NUMA Architectures (2003)

Tao Mu, Jie Tao, Martin Schulz, Sally A. Mckee

Optimizing the performance of shared-memory NUMA programs remains something of a black art, requiring that the application writer understand the behavior of their applications. This diculty...

Identifying and exploiting spatial regularity in data memory references (2003)

Tushar Mohan, Sally A. Mckee, Frank Mueller, Andy Yoo, Martin Schulz

The growing processor/memory performance gap causes the performance of many codes to be limited by memory accesses. If known to exist in an application, strided memory accesses forming streams can be...

An Overview of Formal Hardware Specification Languages (2002)

Annette Bunker, Sally A. McKee, Ganes Gopalakrishnan

Verification is widely recognized as one of the most difficult aspects of computer hardware design. The gap between design and verification capabilities grows, as does the cost of missed flaws. Many...

Efficient Remapping Mechanisms for an Adaptable Memory System (2002)

Lixin Zhang, Al Davis, Wilson Hsieh, Sally A. Mckee, Frederic T. Chong, Date John, ...

The speed gap between processors and memory continues to widen. This problem has led to an increased reliance on complex cache hierarchies. Caches are very eective for programs with near 100% cache...

Editors: (2001)

Martin Schulz, Bruce Childers, Sally A. Mckee, Martin Schulz, Bruce Childers, Sally A. Mckee, ...

(PACT 2001) includes for the first time a Work-in-Progress session. We received many excellent short abstracts – some of which were wild ideas. From these we chose seven abstracts for presentation,...

Reevaluating online superpage promotion with hardware support (2001)

Zhen Fang, Lixin Zhang, John B. Carter, Wilson C. Hsieh, Sally A. Mckee

fipical translation lookaside buffers (TLBs) can map a far smaller region of memory than application foot-prints demand, and the cost of handling TLB misses therefore limits the performance of an...

Cost-model driven integration of restructuring optimizations (2001)

Bharat Ch, John B. Carter, Wilson C. Hsieh, Sally A. Mckee

Loop transformation and array restructuring are important compiler optimizations that improve memory locality in complementary ways. Although previous researchers have proposed integrating the two...

The impulse memory controller (2001)

Lixin Zhang, Zhen Fang, Mike Parker, Binu K. Mathew, Lambert Schaelicke, John B. Carter, ...

Impulse is a memory system architecture that adds an optional level of address indirection at the memory controller. Applications can use this level of indirection to remap their data structures in...

The impulse memory controller (2001)

Lixin Zhang, Zhen Fang, Mike Parker, Binu K. Mathew, Lambert Schaelicke, John B. Carter, ...

Impulse is a memory system architecture that adds an optional level of address indirection at the memory controller. Applications can use this level of indirection to remap their data structures in...

Reevaluating online superpage promotion with hardware support (2001)

Zhen Fang, Lixin Zhang, John B. Carter, Wilson C. Hsieh, Sally A. Mckee

Typical translation lookaside buffers (TLBs) can map a far smaller region of memory than application footprints demand, and the cost of handling TLB misses therefore limits the performance of an...

Reevaluating online superpage promotion with hardware support (2001)

Zhen Fang, Lixin Zhang, John B. Carter, Wilson C. Hsieh, Sally A. Mckee

Typical translation lookaside buffers (TLBs) can map a far smaller region of memory than application footprints demand, and the cost of handling TLB misses therefore limits the performance of an...

Reevaluating online superpage promotion with hardware support (2001)

Zhen Fang, Lixin Zhang, John B. Carter, Wilson C. Hsieh, Sally A. Mckee

Typical translation lookaside buffers (TLBs) can map a far smaller region of memory than application footprints demand, and the cost of handling TLB misses therefore limits the performance of an...

Editors: (2001)

Bruce Childers, Martin Schulz, Sally Mckee, Bruce Childers, Martin Schulz, Sally A. Mckee, ...

This year's Parallel Architectures and Compilation Techniques (PACT 2001) includes for the first time a Work-In-Progress session. We received many excellent short abstract – some of which were...

A cost framework for evaluating integrated restructuring optimizations (2001)

Bharat Ch, John B. Carter, Wilson C. Hsieh, Sally A. Mckee

Loop transformations and array restructuring optimizations usually improve performance by increasing the memory locality of applications, but not always. For instance, loop and array restructuring...

Design of a parallel vector access unit for SDRAM memory systems (2000)

Binu K. Mathew, Sally A. Mckee, John B. Carter, Al Davis

We are attacking the memory bottleneck by building a “smart ” memory controller that improves effective memory bandwidth, bus utilization, and cache efficiency by letting applications dictate how...

Profiling I/O Interrupts in Modern Architectures (2000)

Lambert Schaelicke, Al Davis, Sally A. Mckee

As applications grow increasingly communication-oriented, interrupt performance quickly becomes a crucial component of high performance I/O system design. At the same time, accurately measuring...

TSpec: A Notation for Describing Memory Reference Traces (2000)

Kevin Skadron, Sally A. Mckee, William A. Wulf

Interpreting reference patterns in the output of a processor is complicated by the lack of a succinct notation for humans to use when communicating about them. Since an actual trace is simply an...

Algorithmic Foundations for a Parallel Vector Access Memory System (2000)

Binu K. Mathew, Sally A. Mckee, John B. Carter, Al Davis

This paper presents mathematical foundations for the design of a memory controller subcomponent that helps to bridge the processor/memory performance gap for applications with strided access...

Pointer-based prefetching within the impulse adaptable memory controller: Initial results (2000)

Lixin Zhang, Sally A. Mckee, Wilson C. Hsieh, John B. Carter

Prefetching has long been used to mask the latency of memory loads. This paper presents results for an initial implementation of pointer-based prefetching within the Impulse adaptable memory...

Design of a parallel vector access unit for SDRAM memory systems (2000)

Binu K. Mathew, Sally A. Mckee, John B. Carter, Al Davis

We are attacking the memory bottleneck by building a “smart ” memory controller that improves effective memory bandwidth, bus utilization, and cache efficiency by letting applications dictate how...

Dynamic access ordering for streamed computations (2000)

Sally A. Mckee, William A. Wulf, James H. Aylor, Robert H. Klenke, Senior Member, Maximo H. Salinas, ...

AbstractÐMemory bandwidth is rapidly becoming the limiting performance factor for many applications, particularly for streaming computations such as scientific vector processing or multimedia...

Dynamic Access Ordering for Streamed Computations (2000)

Sally A. Mckee, Wm. A. Wulf, James H. Aylor, Robert H. Klenke, Maximo H. Salinas, Sung I. Hong, ...

Memory bandwidth is rapidly becoming the limiting performance factor for many applications, particularly for streaming computations such as scientific vector processing or multimedia (de)compression....

Hardware-Only Stream Prefetching and Dynamic Access Ordering (2000)

Chengqiang Zhang, Sally A. Mckee

Memory system bottlenecks limit performance for many applications, and computations with strided access patterns are among the hardest hit. The streams used in such applications have extremely poor...

Pointer-Based Prefetching within the Impulse Adaptable Memory Controller: Initial Results (2000)

Lixin Zhang, Sally A. Mckee, Wilson C. Hsieh, John B. Carter

Prefetching has long been used to mask the latency of memory loads. This paper presents results for an initial implementation of pointer-based prefetching within the Impulse adaptable...

Design of a Parallel Vector Access Unit for SDRAM Memory Systems (2000)

Binu K. Mathew, Sally A. Mckee, John B. Carter, Al Davis

We are attacking the memory bottleneck by building a "smart" memory controller that improves effective memory bandwidth, bus utilization, and cache efficiency by letting applications...

Pointer-Based Prefetching within the Impulse Adaptable Memory Controller: Initial Results (2000)

Lixin Zhang Sally, Lixin Zhang, Sally A. Mckee, Wilson C. Hsieh, John B. Carter

Prefetching has long been used to mask the latency of memory loads. This paper presents results for an initial implementation of pointer-based prefetching within the Impulse adaptable memory...

TSpec: A Notation for Describing Memory Reference Traces (2000)

Kevin Skadron, Sally A. Mckee, William A. Wulf

Interpreting reference patterns in the output of a processor is complicated by the lack of a succinct notation for humans to use when communicating about them. Since an actual trace is simply an...

Algorithmic Foundations for a Parallel Vector Access Memory System (2000)

Binu K. Mathew, Sally A. Mckee, John B. Carter, Al Davis

This paper presents mathematical foundations for the design of a memory controller subcomponent that helps to bridge the processor /memory performance gap for applications with strided access...

Parallel Vector Access: A Technique for Improving Memory System Performance (2000)

Binu K. Mathew, Binu K. Mathew, Binu K. Mathew, Sally A. Mckee, John B. Carter, Robert Kessler, ...

Parallel Vector Access (PVA) is a technique that exploits the regularity of vector or stream accesses to perform them efficiently in parallel on a multibank memory system. The performance of vector...

Algorithmic Foundations for a Parallel Vector Access Memory System (2000)

Binu K. Mathew, Sally A. Mckee, John B. Carter, Al Davis

This paper presents mathematical foundations for the design of a memory controller subcomponent that helps to bridge the processor /memory performance gap for applications with strided access...

Caches as Filters: A Unifying Model for Memory Hierarchy Analysis (2000)

Kevin Skadron, Sally A. Mckee, William A. Wulf

This paper outlines the new caches-as-filters framework for the analysis of caching systems, describing the functional filter model in detail. This model is more general than those introduced...

Algorithmic Foundations for a Parallel Vector Access Memory System (2000)

Binu K. Mathew, Sally A. Mckee, John B. Carter, Al Davis

This paper presents mathematical foundations for the design of a memory controller subcomponent that helps to bridge the processor/memory performance gap for applications with strided access...

Algorithmic Foundations for a Parallel Vector Access Memory System (2000)

Binu K. Mathew, Sally A. Mckee, John B. Carter, Al Davis

This paper presents mathematical foundations for the design of a memory controller subcomponent that helps to bridge the processor/memory performance gap for applications with strided access...

Design of a parallel vector access unit for SDRAM memory systems (2000)

Binu K. Mathew, Sally A. Mckee, John B. Carter, Al Davis

We are attacking the memory bottleneck by building a “smart ” memory controller that improves effective memory bandwidth, bus utilization, and cache efficiency by letting applications dictate how...

Caches As Filters: A Framework for the Analysis of Caching Systems (2000)

Sally A. Mckee, Kevin Skadron, Wm. A. Wulf

This paper introduces a new analytical framework for analyzing and designing caches. It consists of four major parts: TSpec notation, into which reference traces can be transformed; equivalence...

Abstract (2000)

Kevin Skadron, Sally A. Mckee, William A. Wulf

This paper outlines the new caches-as-filters framework for the analysis of caching systems, describing the functional filter model in detail. This model is more general than those introduced...

Access order and effective bandwidth for streams on a direct rambus memory (1999)

Sung I. Hong, Sally A. Mckee, Maximo H. Salinas, Robert H. Klenke, James H. Aylor, Wm. A. Wulf

Processor speeds are increasing rapidly, and memory speeds are not keeping up. Streaming computations (such as multi-media or scientific applications) are among those whose performance is most...

Memory system support for image processing (1999)

Lixin Zhang, John B. Carter, Wilson C. Hsieh, Sally A. Mckee

Image processing applications tend to access their data non-sequentially and reuse that data infrequently. As a result, they tend to perform poorly on conventional memory systems due to high cache...

Memory system support for image processing (1999)

Lixin Zhang, John B. Carter, Wilson C. Hsieh, Sally A. Mckee

Image processing applications tend to access their data non-sequentially and reuse that data infrequently. As a result, they tend to perform poorly on conventional memory systems due to high cache...

Impulse: Memory System Support for Scientific Applications (1999)

John B. Carter, Wilson C. Hsieh, Leigh B. Stoller, Mark Swansony, Lixin Zhang, Sally A. Mckee

Impulse is a new memory system architecture that adds two important features to a traditional memory controller. First, Impulse supports application-specific optimizations through configurable...

Impulse: Memory System Support for Scientific Applications (1999)

John Carter Wilson, John B. Carter, Wilson C. Hsieh, Leigh B. Stoller, Mark Swansony, Lixin Zhang, ...

Impulse is a new memory system architecture that adds two important features to a traditional memory controller. First, Impulse supports application-specific optimizations through configurable...

Memory System Support for Image Processing (1999)

Lixin Zhang John, John B. Carter, Wilson C. Hsieh, Sally A. Mckee

Processor speeds are increasing rapidly, but memory speeds are not keeping pace. Image processing is an important application domain that is particularly impacted by this growing performance gap....

Memory System Support for Image Processing (1999)

Lixin Zhang John, John B. Carter, Wilson C. Hsieh, Sally A. Mckee

Processor speeds are increasing rapidly, but memory speeds are not keeping pace. Image processing is an important application domain that is particularly impacted by this growing performance gap....

Memory System Support for Image Processing (1998)

Zhang, Lixin, Carter, John B., Hsieh, Wilson C., McKee, Sally A.

Processor speeds are increasing rapidly, but memory speeds are not keeping pace. Image processing is an important application domain that is particularly impacted by this growing performance gap....

Design and evaluation of dynamic access ordering hardware (1996)

Sally A. Mckee, Assaji Aluwihare, Benjamin H. Clark, Robert H. Klenke, Trevor C. L, Christopher W. Oliver, ...

Memory bandwidth is rapidly becoming the limiting performance factor for many applications, particularly for streaming computations such as scientific vector processing or multimedia (de)compression....

Design and Evaluation of Dynamic Access Ordering Hardware (1996)

Sally A. McKee

Memory bandwidth is rapidly becoming the limiting performance factor for many applications, particularly for streaming computations such as scientific vector processing or multimedia (de)compression....

Compiling for Efficient Memory Utilization (1996)

Sally A. Mckee

this paper is thus to try to call attention to this work. 2. Access Ordering

Design and evaluation of dynamic access ordering hardware (1996)

Sally A. Mckee, Assaji Aluwihare, Benjamin H. Clark, Robert H. Klenke, Trevor C. L, Christopher W. Oliver, ...

Memory bandwidth is rapidly becoming the limiting performance factor for many applications, particularly for streaming computations such as scientific vector processing or multimedia (de)compression....

APPROVAL SHEET (1995)

Sally A. Mckee, Andrew S. Grimshaw, James M. Ortega, James H. Aylor, Dean Richard, ...

To the memories of my grandmother, Helen Viola (1914-1993), and my great aunt, Eileen Alward (1915-1994). Processor speeds are increasing much faster than memory speeds, and thus memory bandwidth is...

Hitting the Memory Wall: Implications of the Obvious (1995)

Wm. A. Wulf, Sally A. Mckee

This brief note points out something obvious--- something the authors "knew" without really understanding. With apologies to those who did understand, we offer it to those others who, like...

Maximizing Memory Bandwidth for Streamed Computations (1995)

W. Miksad, Andrew S. Grimshaw, James M. Ortega, James H. Aylor, ...

Processor speeds are increasing much faster than memory speeds, and thus memory bandwidth is rapidly becoming the limiting performance factor for many applications, particularly those whose inner...

Evaluation of Dynamic Access Ordering Hardware (1995)

S. A. Mckee, C. W. Oliver, J.H. Aylor, K.L. Wright, Sally A. Mckee, Sally A. Mckee, ...

Memory bandwidth is rapidly becoming the limiting performance factor for many applications, particularly for streaming computations --- such as scientific vector processing or multimedia...

A Systematic Approach to Optimizing and Verifying Synthesized High-Speed ASICs (1995)

Trevor C. Landon, Trevor C. L, Robert H. Klenke, Maximo H. Salinas, Robert H, Kenneth L. Wright, ...

This paper describes the design process used in developing a Stream Memory Controller (SMC)*. The SMC can reorder processor-memory accesses dynamically to increase the effective memory bandwidth for...

Bounds on Memory Bandwidth in Streamed Computations (1995)

Trevor C, Sally A. Mckee, Sally A. Mckee, Wm. A. Wulf, Wm. A. Wulf, Trevor C. L

. The growing disparity between processor and memory speeds has caused memory bandwidth to become the performance bottleneck for many applications. In particular, this performance gap severely...

Access Order and Memory-Conscious Cache Utilization (1995)

Sally A. Mckee, Wm. A. Wulf

As processor speeds increase relative to memory speeds, memory bandwidth is rapidly becoming the limiting performance factor for many applications. Several approaches to bridging this performance gap...

Maximizing Memory Bandwidth for Streamed Computations (1995)

Sally A. Mckee, Wm. A. Wulf, Trevor C. L

Abstract. The growing disparity between processor and memory speeds has caused memory bandwidth to become the performance bottleneck for many applications. In particular, this performance gap...

Evaluation of Dynamic Access Ordering Hardware Evaluation of Dynamic Access Ordering Hardware (1995)

S. A. Mckee, C. W. Oliver, Wm. A. Wulf, Sally A. Mckee, Christopher W. Oliver, ...

Memory bandwidth is rapidly becoming the limiting performance factor for many applications, particularly for streaming computations — such as scientific vector processing or multimedia...

Experimental Implementation of Dynamic Access Ordering (1994)

Sally A. Mckee, Robert H. Klenke, Andrew J. Schwab, Wm. A. Wulf, Steven A. Moyer, James H. Aylor, ...

As microprocessor speeds increase, memory bandwidth is rapidly becoming the performance bottleneck in the execution of vector-like algorithms. Although caching provides adequate performance for many...

Dynamic Access Ordering: Bounds on Memory Bandwidth (1994)

Sally A. Mckee, Sally A. Mckee

Memory bandwidth is becoming the limiting performance factor for many applications, particularly scientific computations. Access ordering is one technique that can help bridge the processor-memory...

Uniprocessor SMC Performance on Vectors with Non-Unit Strides (1994)

Sally A. Mckee, Sally A. Mckee

Memory bandwidth is rapidly becoming the performance bottleneck in the application of high performance microprocessors to vector-like algorithms, including the "grand challenge " scientific...

Dynamic Access Ordering for Symmetric Shared-Memory Multiprocessors (1994)

Sally A. Mckee, Sally A. Mckee

Dynamic Access Ordering for Symmetric SharedMemory Multiprocessors Sally A. McKee Department of Computer Science University of Virginia Charlottesville, VA 22903 mckee@cs.virginia.edu Memory...

Performance of Some Design Options 1. Increasing Vector Memory Bandwidth (1993)

Sally A. Mckee, Sally A. Mckee, Sally A. Mckee

Memory bandwidth is rapidly becoming the performance bottleneck in the application of high performance microprocessors to vector-like algorithms, including the “grand challenge” scientific...

Hardware support for dynamic access ordering: Performance of some design options (1993)

Sally A. Mckee, Sally A. Mckee, Sally A. Mckee

Memory bandwidth is rapidly becoming the performance bottleneck in the application of high performance microprocessors to vector-like algorithms, including the "grand challenge "...

Increasing Memory Bandwidth for Vector Computations (1993)

Charles Y. Hitchcock, Sally A. Mckee, Sally A. Mckee, Steven A. Moyer, Steven A. Moyer, Wm. A. Wulf, ...

. Memory bandwidth is rapidly becoming the performance bottleneck in the application of high performance microprocessors to vector-like algorithms, including the "Grand Challenge"...

Toward a Steiner Engine: Enhanced Serial and Parallel Implementations of the Iterated 1-Steiner MRST Algorithm (1993)

Tim Barrera, Jeff Griffith, Sally A. Mckee, Gabriel Robins, Tongtong Zhang

The minimum rectilinear Steiner tree (MRST) problem arises in global routing and wiring estimation, as well as in many other areas. The MRST problem is known to be NP-hard, and the best performing...

Experimental Implementation of Dynamic Access Ordering (1993)

Sally A. Mckee, Robert H. Klenke, Andrew J. Schwab, Wm. A. Wulf, Steven A. Moyer, James H. Aylor, ...

As microprocessor speeds increase, memory bandwidth is rapidly becoming the performance bottleneck in the execution of vector-like algorithms. Although caching provides adequate performance for many...

Hardware Support for Dynamic Access Ordering: Performance of Some Design Options (1993)

Sally A. Mckee, Sally A. Mckee, Sally A. Mckee

Hardware Support for Dynamic Access Ordering: Performance of Some Design Options Sally A. McKee Department of Computer Science University of Virginia Charlottesville, VA, 22903 mckee@virginia.edu...

An Analytic Model of SMC Performance (1993)

Sally A. Mckee, Sally A. Mckee

Memory bandwidth is becoming the limiting performance factor for many applications, particularly scientific computations. Access ordering is one technique that can help bridge the processor-memory...

Toward a Steiner Engine: Enhanced Serial and Parallel (1993)

Tim Barrera, Jeff Griffith, Sally A. Mckee, Gabriel Robins, Tongtong Zhang

The minimum rectilinear Steiner tree (MRST) problem arises in global routing and wiring estima-tion, as well as in many other areas. The MRST problem is known to be NP-hard, and the best perform-ing...

Toward a Steiner Engine: Enhanced Serial and Parallel Implementations of the Iterated 1-Steiner MRST Heuristic (1992)

Tim Barrera, Tim Barrera, Jeff Griffith, Jeff Griffith, Sally A. Mckee, Sally A. Mckee, ...

The minimum rectilinear Steiner tree (MRST) problem arises in global routing and wiring estimation, as well as in many other areas. The MRST problem is known to be NPhard, and the best performing...