![]() |
||||||||
|
||||||||
Linux Cluster |
||||||||
|
|
ECMWF has been running a Linux Cluster since 2004. It is used to provide general purpose computing facilities to augment the desktop systems and to process workload which are not suitable for the High Performance Computing Facility..
By providing facilities to allow balanced interactive login, together with a suitable shared filesystem, the cluster of small Linux servers can be made to resemble a large single system. The Linux Networx Cluster
The cluster makes use of several separate disk storage subsystems:
IBM's GPFS filesystem is used to manage a number of filesystems which are stored on the FAStT disk subsystems; in particular this is used for serving scratch filesystems which need to be quota-controlled. The filesystems are made available directly to the cluster node using GPFS and to other ECMWF systems via NFS from the I/O nodes. GPFS uses a private ethernet connected to the cluster nodes. The cluster provides both interactive and batch access, using Sun Grid Engine (SGE), an open source batch subsystem - see http://gridengine.sunsource.net for further details. Within SGE, 4 of the nodes are configured as interactive nodes, 22 as batch (or compute) nodes and the remaining 6 are I/O nodes. When initiating interactive sessions or batch jobs, SGE will choose an appropriate node, taking into account the current loadlevel of each node. This allows the workload to be distributed across the cluster. The "Drone" clusterIn 2009 the Linux Networx cluster was augmented by 10 HP Proliant DL360 G5 nodes, each with two quad core Intel Xeon 5440 processors and 16 GB of memory. To distinguish these nodes from the Linux Networx nodes, this small cluster is called the "Drone" cluster. These nodes form part of the cluster as far as user work is concerned since they used by SGE to run batch jobs, but they are managed and administered separately via the "iLO" interface on each node. New Linux Cluster - LXA/LXBA replacement for the Linux Networx cluster is in being installed. The current status (July 2010) is that the new hardware has been installed, including new disk subsystems, and the system is now being configured and provisioned. The new system is composed of
The cluster will run SLES 11.1, using SGE to run batch work and GPFS to provide shared filesystems to the cluster and other systems. The cluster has been installed in 2 separate racks, one rack in the main computer hall, the other in the computer hall extension.
|
|||||||
|
|
|||||||