This paper studies application performance on systems with strongly non-uniform remote memory access. In current generation NUMAs the speed difference between the slowest and fastest link in an interconnect---the "NUMA gap"---is typically less th
Sensitivity of Parallel Applications to Large Differences
in Bandwidth and Latency in Two-Layer Interconnects
Aske Plaat Henri E.Bal Rutger F.H.Hofman Thilo Kielmann Department of Computer Science,Vrije Universiteit,Amsterdam,The Netherlands
Abstract
This paper studies application performance on systems
with strongly non-uniform remote memory access.In cur-
rent generation NUMAs the speed difference between the
slowest and fastest link in an interconnect—the“NUMA
gap”—is typically less than an order of magnitude,and
many conventional parallel programs achieve good perfor-
mance.We study how different NUMA gaps influence appli-
cation performance,up to and including typical wide-area
latencies and bandwidths.Wefind that for gaps larger than
those of current generation NUMAs,performance suffers
considerably(for applications that were designed for a uni-
form access interconnect).For many applications,however,
performance can be greatly improved with comparatively
simple changes:traffic over slow links can be reduced by
making communication patterns hierarchical—like the in-
terconnect.Wefind that in four out of our six applications
the size of the gap can be increased by an order of magni-
tude or more without severely impacting speedup.We an-
alyze why the improvements are needed,why they work so
well,and how much non-uniformity they can mask.
1Introduction
As computer systems increase in size,their interconnects
become more hierarchical,resulting in growing bandwidth
and latency differences in their interconnects.This trend is
visible in NUMA machines and clusters of SMPs,where
local memory access is typically a factor of2–10faster than
remote accesses[19].The gap in future large-scale NUMAs
is larger,and the gap in meta-computers and computational
grids is much larger.
For NUMAs with a small gap good performance has
been reported with conventional numerical applications[19,
24].On systems with a larger gap,such as clusters of SMPs
and networks of workstations,it is harder to achieve good
performance[22,25,31].As gaps increase,it is likely that
performance will continue to suffer.1
我要评论