Abstract Analysis of System Overhead on Parallel Computers (2009)
Roberto Gioiosa, Fabrizio Petrini, Kei Davis, Fabien Lebaillif-delamare
Ever-increasing demand for computing capability is driving the construction of ever-larger computer clusters, typically comprising commodity compute nodes, ranging in size up to thousands of...
MulticoreSurprises:LessonsLearnedfromOptimizingSweep3DontheCell (2008)
Fabrizio Petrini, Gordon Fossum, Juan Fernández, Ana Lucia Varbanescu, Mike Kistler, Michael Perrone
TheCellBroadbandEngine(BE)processorprovidesthe potentialtoachieveanimpressivelevelofperformancefor scientificapplications. Thislevelofperformancecanbe...
Eitan Frachtenberg, Fabrizio Petrini, Salvador Coll, Wu Chun Feng
In this paper we explore the performance of gang scheduling on a cluster using the Quadrics interconnection network. On such a cluster, the scheduler can take advantage of the unique capabilities of...
Towards Fault Resilient Global Arrays (2008)
Manoj Krishnan, Bruce Palmer, Fabrizio Petrini, Jarek Nieplocha, F. Peters (eds, Manoj Krishnan, ...
to make digital or hard copies of portions of this work for personal or classroom use is granted provided that the copies are not made or distributed for profit or commercial advantage and that...
Fabrizio Petrini, Wu-chun Feng, Adolfy Hoisie, Salvador Coll, Eitan Frachtenberg
The interconnection network and its associated software libraries are critical components for high-performance cluster computers and supercomputers, Web-server farms, and network-attached storage....
A Finmeccanica Company QsNet II: An Interconnect for Supercomputing Applications * (2008)
Jon Beecroft, David Addison, Fabrizio Petrini, Moray Mclaren
The QsNet II network has been designed to optimize the interprocessor communication performance in systems constructed from standard server building blocks. In order to achieve this, the network...
Scalable Resource Management in High-Performance Computers (2008)
Eitan Frachtenberg, Fabrizio Petrini, Juan Fern, Salvador Coll
Clusters and other loosely-coupled systems are becoming ubiquitous and larger
Jon Beecroft, David Addison, David Hewson, Moray Mclaren, Duncan Roweth, Fabrizio Petrini, ...
Cluster computers—parallel computers built from commodity processors—are becoming the predominant supercomputer architecture because of their combined scalable performance and attractive price....
Efficient Scheduling of Parallel Jobs on Massively Parallel Systems ∗ (2008)
Fabrizio Petrini, Wu-chun Feng
We present buffered coscheduling, a new methodology to multitask parallel jobs in a message-passing environment and to develop parallel programs that can pave the way to the efficient implementation...
Michael Kistler, Michael Perrone, Fabrizio Petrini
Over the past decade, high-performance computing has ridden the wave of commodity computing, building clusterbased parallel computers that leverage the tremendous growth in processor performance...
Approved for Public Release Distribution is Unlimited (2008)
Informatics Group, Blue Gene, A Performance, Darren Kerbyson, Mike Lang, Scott Pakin, ...
Los Alamos National Laboratory, an affirmative action/equal opportunity employer, is operated by University of California for the U.S. Department of Energy under contract W-7405-ENG-36. Neither T...
Juan Fernández, Eitan Frachtenberg, Fabrizio Petrini, José-carlos Sancho
Scalable management of distributed resources is one of the major challenges when building largescale clusters for high-performance computing. This task includes transparent fault tolerance, efficient...
Fabrizio Petrini, Wu-chun Feng
Buffered coscheduling is a distributed scheduling methodology for time-sharing communicating processes in a distributed system, e.g., PC cluster. The principle mechanisms involved in this methodology...
A Finmeccanica Company QsNet II: An Interconnect for Supercomputing Applications * (2008)
Jon Beecroft, David Addison, Fabrizio Petrini, Moray Mclaren
The QsNet II network has been designed to optimize the interprocessor communication performance in systems constructed from standard server building blocks. In order to achieve this, the network...
Fabrizio Petrini, Adolfy Hoisie, Wu-chun Feng, Richard Graham
We present an initial performance evaluation of the Quadrics interconnection network (QsNET). We describe the main hardware and software features of QsNET of relevance to the system designer and to...
Efficient Scheduling of Parallel Jobs on Massively Parallel Systems (2007)
Fabrizio Petrini And, Fabrizio Petrini, Wu-chun Feng
We present buffered coscheduling, a new methodology to multitask parallel jobs in a message-passing environment and to develop parallel programs that can pave the way to the efficient implementation...
Efficient Scheduling of Parallel Jobs on Massively Parallel Systems (2007)
Fabrizio Petrini And, Fabrizio Petrini, Wu-chun Feng
We present buered coscheduling, a new methodology to multitask parallel jobs in a message-passing environment and to develop parallel programs that can pave the way to the ecient implementation of a...
Improved Resource Utilization with Buered Coscheduling (2007)
Fabrizio Petrini, Wu-chun Feng
We present buered coscheduling, a new methodology to multitask parallel jobs in a message-passing environment and to develop parallel programs that can pave the way to the ecient implementation of a...
Darren J. Kerbyson, Adolfy Hoisie, Scott Pakin, Fabrizio Petrini, Harvey J. Wasserman
Energy under contract W-7405-ENG-36. By acceptance of this article, the publisher recognizes that the U.S. Government retains a nonexclusive, royalty-free license to publish or reproduce the...
Static Allocation of Multirail Networks (2007)
Salvador Coll, Eitan Frachtenberg, Fabrizio Petrini, Adolfy Hoisie, Leonid Gurvits
Using multiple independent networks (also known as rails) is an emerging technique to overcome bandwidth limitations and enhance fault-tolerance of current high-performance clusters. This report...
Scalable Resource Management in High Performance Computers (2007)
Eitan Frachtenberg, Fabrizio Petrini, Juan Fern, Salvador Coll
Clusters of workstations have emerged as an important platform for building cost-effective, scalable and highly-available computers. Although many hardware solutions are available today, the largest...
Eitan Frachtenberg, Dror G. Feitelson, Fabrizio Petrini, Juan Fernandez
Fine-grained parallel applications require all their processes to run simultaneously on distinct processors to achieve good efficiency. This is typically accomplished by space slicing, wherein nodes...
Fabrizio Petrini, Eitan Frachtenberg, Adolfy Hoisie, Salvador Coll
Abstract. In this paper we present an in-depth description of the Quadrics interconnection network (QsNET) and an experimental performance evaluation on a 64-node AlphaServer cluster. We explore...
Eitan Frachtenberg, Dror G. Feitelson, Juan Fernandez, Fabrizio Petrini
Jobs that run on parallel systems that use gang scheduling for multiprogramming may interact with each other in various ways. These interactions are affected by system parameters such as the level of...
Eitan Frachtenberg, Dror G. Feitelson, Fabrizio Petrini, Juan Fern
Fine-grained parallel applications require all their processes to run simultaneously on distinct processors to make good progress. This is typically achieved by space slicing with variable...
Fabrizio Petrini, Wu-chun Feng, Adolfy Hoisie, Salvador Coll, Eitan Frachtenberg
The interconnection network and its associated software libraries are critical components for high-performance cluster computers and supercomputers, Web-server farms, and network-attached storage....
Many theoretical models of parallel computation are based on overly simplistic assumptions on the performance of the interconnection network. For example they assume constant latency for any...
Communication Performance of Wormhole Interconnection Networks (2007)
A Degli, Studi Di Pisa, Fabrizio Petrini
Fat trees and low dimensional toroidal cubes have raised a great interest in the scientific community in the last few years and are emerging standards in the design of interconnection networks for...
Fabrizio Petrini, Marco Vanneschi
The past few years have seen a rise in popularity of massively parallel architectures that use fat-trees as their interconnection networks. In this paper we formalize a parametric family of...
with the Red Rover Algorithm (2007)
Abstract The Red Rover algorithm previously presented for deadlock-free routing in rings is applied to bidirectional k-ary n-cube multicomputer networks in this work. This algorithm provides greater...
Sponsored by a Marie Curie Fellowship Contract No ERBFMBICT972076 (2007)
Running Head: Network Performance with Scientific Applications This paper describes a family of networks, the bi-directional k-ary n-butterflies, and presents a partially adaptive routing algorithm...
Fabrizio Petrini, Adolfy Hoisie, Wu-chun Feng, Richard Graham
We present an initial performance evaluation of the Quadrics interconnection network (QsNET). We describe the main hardware and software features of QsNET of relevance to the system designer and to...
Ecient Total-Exchange in Wormhole-Routed Toroidal Cubes (2007)
The total-exchange is one of the most dense communication patterns and is at the heart of numerous applications and programming models in parallel computing. In this paper we present a simple...
Ros V. Gerbessiotis, Ros V. Gerbessiotis, Fabrizio Petrini, Fabrizio Petrini, Wolfson Building
The BSP model by L.G. Valiant has been proposed as a unifying and bridging model for the design, analysis and programming of general purpose parallel computing systems. A number of libraries have...
Scalable Resource Management in High Performance Computers (2007)
Eitan Frachtenberg Fabrizio, Fabrizio Petrini, Juan Fern, Salvador Coll
Clusters of workstations have emerged as an important platform for building cost-effective, scalable, and highlyavailable computers. Although many hardware solutions are available today, the largest...
J.F.: Challenges in Mapping Graph Exploration Algorithms on Advanced Multi-core Processors (2007)
Oreste Villa, Daniele Paolo Scarpazza, Fabrizio Petrini, Juan Fernández Peinador
Multi-core processors are a shift of paradigm in computer architecture that promises a dramatic increase in performance. But multi-core processors also bring an unprecedented level of complexity in...
An Abstract Interface for System Software on Large-Scale Clusters (2006)
Fernández, Juan, Frachtenberg, Eitan, Petrini, Fabrizio, Sancho, José-Carlos
Scalable management of distributed resources is one of the major challenges when building large-scale clusters for high-performance computing. This task includes transparent fault tolerance,...
An Abstract Interface for System Software on Large-Scale Clusters (2006)
Fernández, Juan, Frachtenberg, Eitan, Petrini, Fabrizio, Sancho, José-Carlos
Scalable management of distributed resources is one of the major challenges when building large-scale clusters for high-performance computing. This task includes transparent fault tolerance,...
NIC-based Reduction Algorithms for Large-scale Clusters (2005)
Fabrizio Petrini, Adam Moody, Juan Fernandez, Eitan Frachtenberg, Dhabaleswar K. Panda
Abstract — Efficient algorithms for reduction operations across a group of processes are crucial for good performance in many large-scale, parallel scientific applications. While previous...
● High Inter Process Communication Requires synchronization needs co- (2005)
Eitan Frachtenberg, Dror G. Feitelson, Senior Member, Fabrizio Petrini, Student Member
scheduling ● Two common Approaches: ➢ batch scheduling ➔ wherein nodes are dedicated for the duration of the run ➢ gang scheduling ➔ wherein time slicing is coordinated across processors...
Adaptive parallel job scheduling with flexible coscheduling (2005)
Eitan Frachtenberg, Dror G. Feitelson, Fabrizio Petrini, Juan Fern
Many scientific and high-performance computing applications consist of multiple processes running on different processors that communicate frequently. Because of their synchronization needs, these...
Adaptive parallel job scheduling with flexible coscheduling (2005)
Eitan Frachtenberg, Dror G. Feitelson, Fabrizio Petrini, Juan Fernández
Many scientific and high-performance computing applications consist of multiple processes running on different processors that communicate frequently. Because of their synchronization needs, these...
Adaptive parallel job scheduling with flexible coscheduling (2005)
Eitan Frachtenberg, Dror G. Feitelson, Fabrizio Petrini, Juan Fern
Many scientific and high-performance computing applications consist of multiple processes running on different processors that communicate frequently. These applications can suffer severe performance...
Designing Parallel Operating Systems via Parallel Programming (2004)
Eitan Frachtenberg, Kei Davis, Fabrizio Petrini, Juan Fern, José Carlos Sancho
Abstract. Ever-increasing demand for computing capability is driving the construction of ever-larger computer clusters, soon to be reaching tens of thousands of processors. Many functionalities of...
System-Level Fault-Tolerance in Large-Scale Parallel Machines with Buffered Coscheduling (2004)
Fabrizio Petrini, Kei Davis, José Carlos Sancho
As the number of processors for multi-teraflop systems grows to tens of thousands, with proposed petaflops systems likely to contain hundreds of thousands of processors, the assumption of fully...
A performance evaluation on an Alpha EV7 processing node, Int (2004)
Darren J. Kerbyson, Darren J. Kerbyson, Adolfy Hoisie, Adolfy Hoisie, Scott Pakin, Scott Pakin, ...
In this paper we detail the performance of a new Alpha-Server node containing 16 Alpha EV7 CPUs. The EV7 processor is based on the EV68 processor core that is used in terascale systems at Los Alamos...
Architectural Support for System Software on Large-Scale Clusters (2004)
Juan Fernández, Eitan Frachtenberg, Fabrizio Petrini, Kei Davis, Jose Carlos Sancho
Scalable management of distributed resources is one of the major challenges in deployment of large-scale clusters. Man-agement includes transparent fault tolerance, efficient allocation of resources,...
A Performance and Scalability Analysis of the BlueGene/L Architecture (2004)
Kei Davis, Adolfy Hoisie, Greg Johnson, Darren J. Kerbyson, Mike Lang, Scott Pakin, ...
Based on a set of measurements done on the 512-node 500MHz prototype and early results on a 2048 node 700MHz BlueGene/L machine at IBM Watson, we present a performance and scalability analysis of the...
Eitan Frachtenberg, Dror G. Feitelson, Fabrizio Petrini, Juan Fern
Fine-grained parallel applications require all their processes to run simultaneously on distinct processors to achieve good efficiency. This is typically achieved by space slicing with variable...
Eitan Frachtenberg, Dror G. Feitelson, Fabrizio Petrini, Juan Fern
Fine-grained parallel applications require all their processes to run simultaneously on distinct processors to achieve good efficiency. This is typically accomplished by space slicing, wherein nodes...
Salvador Coll, José Duato, Francisco J. Mora, Fabrizio Petrini, Adolfy Hoisie
Abstract The efficent implementation of collective communication is a key factor to provide good performance and scalability of communication patterns that involve global data movement and global...
Salvador Coll, José Duato, Francisco J. Mora, Fabrizio Petrini, Adolfy Hoisie
Abstract The efficient implementation of collective communication is a key factor to provide good performance and scalability of communication patterns that involve global data movement and global...
Scalable Hardware-Based Multicast Trees (2003)
Salvador Coll, José Duato, Fabrizio Petrini, Francisco J. Mora
c ○ 2003 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by a contractor or affiliate of the U.S. Government. As such, the Government...
The Case of the Missing Supercomputer Performance: Achieving (2003)
Fabrizio Petrini, Darren J. Kerbyson, Scott Pakin
© 2003 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by a contractor or affiliate of the U.S. Government. As such, the Government retains a...
Parallel job scheduling under dynamic workloads (2003)
Eitan Frachtenberg, Dror G. Feitelson, Juan Fern, Fabrizio Petrini
Jobs that run on parallel systems that use gang scheduling for multiprogramming may interact with each other in various ways. These interactions are affected by system parameters such as the level of...
Parallel Job Scheduling Under Dynamic Workloads (2003)
Eitan Frachtenberg Dror, Dror G. Feitelson, Juan Fernandez, Fabrizio Petrini
Jobs that run on parallel systems that use gang scheduling for multiprogramming may interact with each other in various ways. These interactions are affected by system parameters such as the level of...
Scalable collective communication on the ASCI Q machine (2003)
Fabrizio Petrini, Juan Fernandez, Eitan Frachtenberg, Salvador Coll
Scientific codes spend a considerable part of their run time executing collective communication operations. Such operations can also be critical for efficient resource management in large-scale...
Scalable Hardware-Based Multicast Trees (2003)
Salvador Coll, José Duato, Fabrizio Petrini, Francisco J. Mora
c ○ 2003 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by a contractor or affiliate of the U.S. Government. As such, the Government...
STORM: Lightning-Fast Resource Management (2002)
Eitan Frachtenberg, Fabrizio Petrini, Juan Fernandez, Scott Pakin, Salvador Coll
Although workstation clusters are a common platform for high-performance computing (HPC), they remain more difficult to manage than sequential systems or even symmetric multiprocessors. Furthermore,...
Salvador Coll, Fabrizio Petrini, Eitan Frachtenberg, Adolfy Hoisie
A common trend in the design of large-scale clusters is to use a high-performance data network to integrate the processing nodes in a single parallel computer. In these systems the performance of the...
Scalable Resource Management in High-Performance Computers (2002)
Eitan Frachtenberg, Fabrizio Petrini, Juan Fern, Salvador Coll
Clusters and other loosely-coupled systems are becoming ubiquitous and larger
Scaling to Thousands of Processors with (2002)
Buffered Coscheduling Fabrizio, Fabrizio Petrini
In this paper we describe Buffered Coscheduling, a new approach to design the system software of large scale parallel computers. A buffered coscheduled system can tolerate inefficient programs,...
Scaling to Thousands of Processors with Buffered Coscheduling (2002)
In this paper we describe Buffered Coscheduling, a new approach to design the system software of large scale parallel computers. A buffered coscheduled system can tolerate inefficient programs,...
Performance Evaluation of the Quadrics Interconnection Network (2001)
Fabrizio Petrini, Adolfy Hoisie, Wu-chun Feng, Richard Graham
We present an initial performance evaluation of the Quadrics interconnection network (QsNET). We describe the main hardware and software features of QsNET of relevance to the system designer and to...
The Quadrics Network (QsNet): High-Performance Clustering Technology (2001)
Fabrizio Petrini, Wu-chun Feng, Adolfy Hoisie, Salvador Coll, Eitan Frachtenberg
The Quadrics interconnection network (QsNet) contributes two novel innovations to the field of highperformance interconnects: (1) integration of the virtualaddress spaces of individual nodes into a...
The Quadrics Network (QsNet): High-Performance Clustering Technology (2001)
Fabrizio Petrini, Wu-chun Feng, Adolfy Hoisie, Salvador Coll, Eitan Frachtenberg
The Quadrics interconnection network (QsNet) contributes two novel innovations to the field of highperformance interconnects: (1) integration of the virtualaddress spaces of individual nodes into a...
Using multirail networks in high-performance clusters (2001)
Salvador Coll, Eitan Frachtenberg, Fabrizio Petrini, Adolfy Hoisie, Leonid Gurvits
Using multiple independent networks (also known as rails) is an emerging technique to overcome bandwidth limitations and enhance fault tolerance of current high-performance parallel computers. In...
Using multirail networks in high-performance clusters (2001)
Salvador Coll, Eitan Frachtenberg, Fabrizio Petrini, Adolfy Hoisie, Leonid Gurvits
Using multiple independent networks (also known as rails) is an emerging technique to overcome bandwidth limitations and enhance fault tolerance of current highperformance clusters. We present an...
Gang scheduling with lightweight user-level communication (2001)
Eitan Frachtenberg, Fabrizio Petrini, Salvador Coll, Wu-chun Feng
In this paper, we explore the performance of gang scheduling on a cluster using the Quadrics interconnection network. In such a cluster, the scheduler can take advantage of this network’s unique...
Predictive performance and scalability modeling of a large-scale application (2001)
J. Kerbyson, Hank J. Alme, Adolfy Hoisie, Fabrizio Petrini, Harvey J. Wasserman, Michael Gittings
distribution is unlimited.
The Quadrics Network (QsNet): High-Performance Clustering Technology (2001)
Fabrizio Petrini, Wu-chun Feng, Adolfy Hoisie, Salvador Coll, Eitan Frachtenberg
The Quadrics interconnection network (QsNet) contributes two novel innovations to the field of highperformance interconnects: (1) integration of the virtualaddress spaces of individual nodes into a...
Performance Evaluation of the Quadrics Interconnection Network (2001)
Fabrizio Petrini, Salvador Coll, Eitan Frachtenberg, Adolfy Hoisie
In this paper we present an in-depth description of the Quadrics interconnection network (QsNET) and an experimental performance evaluation on a 64-node Alphaserver cluster. We expose the performance...
Gang scheduling with lightweight user-level communication (2001)
Eitan Frachtenberg, Fabrizio Petrini, Salvador Coll, Wu-chun Feng
In this paper, we explore the performance of gang scheduling on a cluster using the Quadrics interconnection network. In such a cluster, the scheduler can take advantage of this network's unique...
Using multirail networks in high-performance clusters (2001)
Salvador Coll, Eitan Frachtenberg, Fabrizio Petrini, Adolfy Hoisie, Leonid Gurvits
Using multiple independent networks (also known as rails) is an emerging technique to overcome bandwidth limitations and enhance fault-tolerance of current high-performance clusters. We present and...
Using multirail networks in high-performance clusters (2001)
Salvador Coll, Eitan Frachtenberg, Fabrizio Petrini, Adolfy Hoisie, Leonid Gurvits
Using multiple independent networks (also known as rails) is an emerging technique to overcome bandwidth limitations and enhance fault tolerance of current high-performance parallel computers. In...
Performance Evaluation of the Quadrics Interconnection Network (2001)
Fabrizio Petrini, Salvador Coll, Eitan Frachtenberg, Adolfy Hoisie
In this paper we present an in-depth description of the Quadrics interconnection network (QsNET) and an experimental performance evaluation on a 64-node AlphaServer cluster. We explore several...
Scheduling with global information in distributed systems (2000)
Fabrizio Petrini, Wu-chun Feng
One of the major problems faced by the developers of parallel programs is the lack of a clear separation between the programming model and the operat-ing system. In this paper, we present a new...
Fabrizio Petrini, Wu-chun Feng
Buffered coscheduling is a scheduling methodology for time-sharing communicating processes in parallel and distributed systems. The methodology has two primary features: communication buffering and...
Time-Sharing Parallel Jobs in the Presence of Multiple Resource Requirements (2000)
Fabrizio Petrini, Wu-chun Feng
. Buered coscheduling is a new methodology that can substantially increase resource utilization, improve response time, and simplify the development of the run-time support in a parallel machine. In...
Fabrizio Petrini, Wu-chun Feng
Buffered coscheduling is a scheduling methodology for time-sharing communicating processes in parallel and distributed systems. The methodology has two primary features: communication buffering and...
Time-Sharing Parallel Jobs in the Presence of Multiple Resource Requirements (2000)
Fabrizio Petrini And, Fabrizio Petrini, Wu-chun Feng
. Buered coscheduling is a new methodology that can substantially increase resource utilization, improve response time, and simplify the development of the run-time support in a parallel machine. In...
Scheduling with Global Information in Distributed Systems (2000)
Fabrizio Petrini, Wu-chun Feng
One of the major problems faced by the developers of parallel programs is the lack of a clear separation between the programming model and the operating system. In this paper, we present a new...
Scheduling with Global Information in Distributed Systems (2000)
Fabrizio Petrini, Wu-chun Feng
Buffered coscheduling is a distributed scheduling methodology for time-sharing communicating processes in a distributed system, e.g., PC cluster. The principle mechanisms involved in this methodology...
Efficient Total-Exchange In Wormhole-Routed Toroidal Cubes (2000)
The total-exchange is one of the most dense communication patterns and is at the heart of numerous applications and programming models in parallel computing. In this paper we present a simple...
Time-Sharing Parallel Jobs in the Presence of Multiple Resource Requirements (2000)
Fabrizio Petrini, Wu-chun Feng
Abstract. Buffered coscheduling is a new methodology that can substantially increase resource utilization, improve response time, and simplify the development of the run-time support in a parallel...
Improved Resource Utilization with Buffered Coscheduling (2000)
Fabrizio Petrini, Wu-chun Feng
We present buffered coscheduling, a new methodology to multitask parallel jobs in a message-passing environment and to develop parallel programs that can pave the way to the efficient implementation...
Fabrizio Petrini, Federico Bassetti, Alexandros Gerbessiotis
Abstract A typical way to increase the performance of a parallel program on a given parallel platform is to try to overlap computation and communication in order to decrease running time and...
Fabrizio Petrini, Federico Bassetti, Alexandros Gerbessiotis
Abstract A typical way to increase the performance of a parallel program on a given parallel platform is to try to overlap computation and communication in order to decrease running time and...
Latency and Bandwidth Requirements of Massively Parallel Programs: FFT as a Case Study (1999)
Fabrizio Petrini, Marco Vanneschi
In this paper we compare three routing algorithms for massively parallel architectures, each offering an increasing degree of adaptivity: a deterministic algorithm, a minimal adaptive based on...
Latency and Bandwidth Requirements of Massively Parallel Programs: FFT as a Case Study (1999)
Fabrizio Petrini, Marco Vanneschi
Many theoretical models of parallel computation are based on overly simplistic assumptions on the performance of the interconnection network. For example they assume constant latency for any...
Latency and Bandwidth Requirements of Massively Parallel Programs: FFT as a Case Study (1999)
Many theoretical models of parallel computation are based on overly simplistic assumptions on the performance of the interconnection network. For example they assume constant latency for any...
Total-Exchange on Wormhole k-ary n-cubes with Adaptive Routing (1998)
The total-exchange is one of the most dense communication patterns and is at the heart of numerous applications and programming models in parallel computing. In this paper we present a simple...
Network Performance Assessment under the BSP Model (1998)
Ros V. Gerbessiotis, Fabrizio Petrini
Abstract. A number of libraries have been implemented that allow programming following the BSP paradigm with one of them being the Oxford BSP Toolset. Algorithm designers and software engineers are...
Total-Exchange on Wormhole k-ary n-cubes with Adaptive Routing (1998)
The total-exchange is one of the most dense communication patterns and is at the heart of numerous applications and programming models in parallel computing. In this paper we present a simple...
The Quadrics Network: High-Performance Clustering Technology (1998)
Fabrizio Petrini, Wu-chun Feng, Adolfy Hoisie, Salvador Coll, Eitan Frachtenberg
The interconnection network and its associated software libraries are critical components for high-performance cluster computers and supercomputers, Web-server farms, and network-attached storage....
k-ary n-trees: High Performance Networks for Massively Parallel Architectures (1997)
Fabrizio Petrini, Marco Vanneschi
The past few years have seen a rise in popularity of massively parallel architectures that use fat-trees as their interconnection networks. In this paper we study the communication performance of a...
LIFE: A Limited Injection, Fully AdaptivE, Recovery-Based Routing Algorithm (1997)
Fabrizio Petrini, Jose Duato, Pedro Lopez, Juan-Miguel Martinez
Networks using wormhole switching have traditionally relied upon deadlock avoidance strategies for the design of deadlock-free algorithms. The past few years have seen a rise in popularity of...
Efficient Personalized Communication on Wormhole Networks (1997)
Fabrizio Petrini, Marco Vanneschi
Bridging models, as the BSP, tend to abstract the characteristics of the interconnection networks using a small set of parameters, by dividing the computation in supersteps and organizing the...
Efficient Total-Exchange in WormholeRouted Toroidal Cubes (1997)
Fabrizio Petrini, Marco Vanneschi
Abstract. The total-exchange is one of the most dense communication patterns and is at the heart of numerous applications and programming models in parallel computing. In this paper we present a...
Network Performance under Physical Constraints. Submitted for publication to (1997)
Fabrizio Petrini, Marco Vanneschi
The performance of an interconnection network in a massively parallel architecture is subject to physical constraints whose impact needs to be re-evaluated from time to time. Fat-trees and low...
A Comparison of Wormhole-Routed Interconnection Networks (1997)
Fabrizio Petrini, Marco Vanneschi
Fat-trees and low dimensional cubes have raised a great interest in the scientific community in the last few years and are emerging standards in the design of interconnection networks for massively...
SMART: a Simulator of Massive ARchitectures and Topologies (1997)
Fabrizio Petrini, Marco Vanneschi
Many important results in the area of computer architecture have been achieved using simulators. In this paper we present SMART, a simulator of parallel architectures. SMART provides a flexible and...
Fabrizio Petrini, Marco Vanneschi
Deadlock recovery as a viable alternative to deadlock avoidance has recently gained an increasing consideration in the scientific community. In this paper we present a simple and efficient minimal...
LIFE: a Limited Injection, Fully adaptivE, Recovery-Based Routing Algorithm (1997)
Fabrizio Petrini, José Duato, Pedro López, Juan-Miguel Martínez
Networks using wormhole switching have traditionally relied upon deadlock avoidance strategies for the design of deadlock-free algorithms. The past few years have seen a rise in popularity of...
Minimal Adaptive Routing with Limited Injection on Toroidal k-ary n-cubes (1996)
Fabrizio Petrini, Marco Vanneschi
Virtual channels can be used to implement deadlock free adaptive routing algorithms and increase network throughput. Unfortunately, they introduce asymmetries in the use of buffers of symmetric...
Minimal vs. non Minimal Adaptive Routing on k-ary n-cubes (1996)
Fabrizio Petrini, Marco Vanneschi
There is a common agreement in the scientific community that adaptive routing algorithms will eventually replace the deterministic ones that are currently in use in multicomputer networks. An open...
K-ary N-trees: High Performance Networks for Massively Parallel Architectures (1995)
Fabrizio Petrini, Marco Vanneschi
The past few years have seen a rise in popularity of massively parallel architectures that use fat-trees as their interconnection In this paper we formalize a parametric family of fat-trees, the...