
Features:
INTERVIEW WITH BILL BLAKE, SVP, PRODUCT DVPT., NETEZZA by Alan
Beck, Editor-in-Chief, HPCwire
HPCwire: Thus far, the race between COTS-based cluster supercomputers
and those based upon proprietary processors has resembled that of Achilles
and the tortoise: the clusters approach ever nearer but never quite
succeed in surpassing -- or indeed drawing even with -- their elite
contenders. Will this situation ever change? Why or why not? And given the
speeds involved, will it still matter?
BLAKE: Lessons learned from the development of proprietary processors
have tended to flow down to the industry standard parts, much like the
lessons learned on formula 1 racing cars have influenced "commodity cars".
And today's crop of 64 bit industry standard processors are certainly
catching up to where proprietary processors such as Alpha were just a few
years ago. But the key factor may well be the economics, since innovation
in areas such as new approaches to on-chip paralellism in microprocessors
can be a billion dollar proposition in order to significantly exceed the
industry standard parts.
Does it matter? Yes if the industry standard parts do not support the
dramatically higher memory bandwidth requirements of supercomputing then
it matters a lot. Architectural approaches such as hypertransport are very
important to opening up the memory system of the processor to high
performance system interconnects that are the lifeblood of scalable
parallel systems. At the system level, clusters will clearly dominate as
all the hardware and software building blocks are there to deliver
significant application performance with very good price/performance
characteristics.
Linux clusters are the mainstream, and their elite contenders as you
call them are relegated to those specialized cases where highest
capability is required. As for capacity computing, the important load
sharing software, be it from Platform Computing Inc. or many home grown
varieties, is in place to support excellent system utilization is in use
everywhere. The key enabling technology for COTS-based cluster
supercomputers for parallel compute intensive applications has been tools
like MPI for coarse grained message passing plus a lot of work by parallel
application developers to deal with explicit parallelism in their
codes.
HPCwire: What kinds of networking architectures will provide the
principal support for the simplified supercomputers of the future? Will
such networks ultimately prove as unwieldy, in their own way, as
traditional HPC vector processors? Why or why not?
BLAKE: Myricom and Quadrics are both setting the bar for all others to
meet in terms of bandwidth and latency. And I expect them to continue in
that mode for the forseeable future especially if they can continue to
exploit new high bandwidth memory interfaces such as hypertransport. But
there are a number of new startups, such as Alacritech and Amasso, that
are trying to improve bandwidth and lower the latency of the standard
ethernet stack and if they are successful then they will cannibalize the
proprietary schemes. By mid to late decade, the horse race at the high end
will be between proprietary all-optical switches in conventional
topologies like the fat tree and highly optimized mesh architectures built
into the microprocessor itself (but without the overhead of maintain cache
coherence in large meshes).
HPCwire: Will the new strategy of simplified computing truly provide
HPC power for general use -- or will security concerns eventually eclipse
the enormous potential that appears to lie just ahead?
BLAKE: I expect the grid forces to solve the security issues needed to
make simplified computing truly available for general use. For example,
the success Avaki has had with secure data grids for the pharmaceutical
companies is very encouraging.
HPCwire: Any other surprises at the system architectural level?
BLAKE: Absolutely. As cluster supercomputers mature, we are seeing
significant innovation at the system architectural level, especially in
the specialization of node function. New systems such as the Cray/Sandia
Red Storm show an interesting approach to both scalability and
serviceability with cluster nodes optimized for compute (very light weight
kernel) versus file system (full Linux) versus service and maintenance. By
solving the software challenges of heterogeneous and asymmetric node
configurations, system performance and functionality will improve. At
Netezza we are pursuing that path for the hardware needed to support
analytic terascale databases as we couple a front end SMP machine to a
highly parallel database engine with nodes optimized for database
operations with processing as close to the disk where data resides as
possible. |