![]() |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
The IBM Supercomputers |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
One of the two IBM POWER6 Cluster 1600 systems installed at ECMWF ECMWF's High Performance Computing Facility (HPCF) comprises two identical but independent IBM Cluster 1600 supercomputer systems. The computational basis of the HPCF is IBM's POWER6 microprocessor. Since no single application will require more than half of the total computing resources it was decided that the system would comprise two independent clusters. This has several advantages. Two independent clusters add significantly to the resiliency of the system. For example, if one cluster were to suffer a major failure, the other cluster could still provide a service while the fault is being rectified. Another advantage is increased availability, for instance a system session that requires a whole cluster to be taken out of production for a period of time will only affect half of the system; the other cluster can continue to run production work. A further advantage is flexibility in maintaining and upgrading the operating system. It is possible to install new releases of software on one of the clusters and allow this release to run in production on that cluster, while the other cluster runs the earlier software release, until the time comes for it too to be upgraded. The equipment is based on pSeries p6-575 servers interconnected by a low latency high speed 8-plane IB4x-DDR infiniband network. Each separate compute cluster comprises 286 pSeries p6-575 symmetric multiprocessor (SMP) servers (or nodes):
Schematic configuration diagram
The high-memory nodes, which have 256GB of memory as opposed to 64GB in the "normal memory" nodes, allow for greater flexibility, especially for serial (non-parallel) programs that require large amounts of memory, but which cannot be converted to parallel programs that could then use the aggregated memory of multiple nodes. The pSeries p6-575 SMP server has 32 separate processors (or "cores") with a clock frequency of 4.7GHz, giving a theoretical peak performance of 18.8GFLOPS. These processors are capable of running 2 threads concurrently. This is called simultaneous multi-threading (SMT) and has the effect of making each node appear to have 64 (logical) CPUs instead of the 32 (physical) processors. Besides the two cores on the water-cooled microprocessor chip, it also holds two memory controllers within its 790 million transistors as well as two separate 4MB level-2 caches, one for each core. The 8-plane IB4x-DDR infiniband network connects each of the p6-575 nodes within an individual compute cluster. The eight switch planes provide a considerable increase in performance over that of a single plane and also enable the network to have better resiliency with respect to hardware errors. Each cluster runs the AIX 5.3 operating system and Cluster Systems Management software (CSM). LoadLeveler is used as the batch subsystem. The multi-cluster version of IBM's General Parallel File System (MC-GPFS) is used by the clusters. This is a distributed journalled file system that uses token-based mechanisms to ensure file system consistency. It utilises NSD, the Network Shared Disk feature, to share data at a much higher level of performance than other file sharing mechanisms, such as NFS. The whole system has about 1.2PB (1 petabyte = 1 million gigabytes) of SAS (Serial Attached SCSI) disks, connected to two separate storage I/O clusters. Each of these storage I/O clusters are made up of BIUs (Basic I/O Units), each comprising two pSeries p6-520 servers, each having four 4.2GHz POWER6 processors and 8GB of memory. The two servers are connected in a redundant fashion via RAID controllers to 14 drawers of disks, each drawer holding 12 (300GB) SAS disks in a 4D+P+Q RAID-6 configuration. The storage I/O clusters serve their data to both compute clusters over a 2-plane IB4x-DDR infiniband network, to which the 12 NIONs in each compute cluster are connected. The MC-GPFS architecture enables any node on any cluster to access any file on either of the storage I/O clusters. This enables users of ECMWF's high-performance computing facility to work with increased productivity. Feature comparison between the new Phase 1 (Power 6) and old Phase 4 (Power 5+) systems
A view of the IBM p6-575 servers |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||