Debunking, then Duplicating Ultracomputer Performance Claims by Debugging the Combining Switches
Eric Freudenthal and Allan Gottlieb
To appear at
Workshop on Duplicating, Deconstructing, and Debunking (WDDD04), Munich, Germany, June 19-20, 2004
Abstract
Memory system congestion due to serialization of hot spot accesses can adversely affect the performance of interprocess coordination algorithms. Hardware and software techniques have been proposed to reduce this congestion and thereby provide superior system performance. The combining networks of Gottlieb et al. automatically parallelize concurrent hot spot memory accesses, improving the performance of algorithms that poll a small number of shared variables.
We begin by debunking one of the performance claims made for the NYU Ultracomputer. Specifically, a gap in its simulation coverage hid a design flaw in the combining switches that seriously impacts the performance of busy wait polling in centralized coordination algorithms. We then debug the system by correcting the design and closing the simulation gap, after which we are able to duplicate the
original claims of excellent performance on busy wait polling. Specifically our simulations show that, with the revised design, the Ultracomputer readers-writers and barrier algorithms achieve performance comparable to the highly regarded MCS algorithms.