Home page  
Home   Your Room   Login   Contact   Feedback   Site Map   Search:  
Discover this product  
About Us
Overview
Getting here
Committees
Products
Forecasts
Order Data
Order Software
Services
Computing
Archive
PrepIFS
Research
Modelling
Reanalysis
Seasonal
Publications
Newsletters
Manuals
Library
News&Events
Calendar
Employment
Open Tenders
    
Home > Services > Computing > Overview > Linux Cluster >     
    

Linux Cluster

 
  ECMWF has been running a Linux Cluster since 2004. It is used to provide general purpose computing facilities to augment the desktop systems and to process workload which are not suitable for the High Performance Computing Facility..

By providing facilities to allow balanced interactive login, together with a suitable shared filesystem, the cluster of small Linux servers can be made to resemble a large single system.

The Linux Networx Cluster

Linux Networkx ClusterThe Linux Cluster comprises 32 nodes, each with 2 AMD Opteron processors and 4 GB memory. The cluster, which was bought from Linux Networx, currently runs SLES (SUSE Linux Enterprise Server) 9.2. It has a seperate "master node" which is used to configure, boot and manage the cluster. Since the cluster nodes cannot be booted without a master node, the system also include a backup master node.

The cluster makes use of several separate disk storage subsystems:

  • IBM FAStT500 and FAStT600 Turbo disk subsystems with about 14 TB of diskspace; this is connected to the I/O nodes which have FibreChannel HBA interfaces.
  • A Panasas Storage Cluster (see www.panasas.com), with 2 Shelves each with 8 Storage Blades and 3 Director Blades, providing about 8 TB of disk space; this is connected to the cluster via Gigabit ethernet.

IBM's GPFS filesystem is used to manage a number of filesystems which are stored on the FAStT disk subsystems; in particular this is used for serving scratch filesystems which need to be quota-controlled. The filesystems are made available directly to the cluster node using GPFS and to other ECMWF systems via NFS from the I/O nodes. GPFS uses a private ethernet connected to the cluster nodes.

The cluster provides both interactive and batch access, using Sun Grid Engine (SGE), an open source batch subsystem - see http://gridengine.sunsource.net for further details. Within SGE, 4 of the nodes are configured as interactive nodes, 22 as batch (or compute) nodes and the remaining 6 are I/O nodes. When initiating interactive sessions or batch jobs, SGE will choose an appropriate node, taking into account the current loadlevel of each node. This allows the workload to be distributed across the cluster.

The "Drone" cluster

In 2009 the Linux Networx cluster was augmented by 10 HP Proliant DL360 G5 nodes, each with two quad core Intel Xeon 5440 processors and 16 GB of memory. To distinguish these nodes from the Linux Networx nodes, this small cluster is called the "Drone" cluster.

These nodes form part of the cluster as far as user work is concerned since they used by SGE to run batch jobs, but they are managed and administered separately via the "iLO" interface on each node.

New Linux Cluster - LXA/LXB

A replacement for the Linux Networx cluster is in being installed. The current status (July 2010) is that the new hardware has been installed, including new disk subsystems, and the system is now being configured and provisioned.

The new system is composed of

  • 32 HP Proliant DL360 G6 nodes each with
    • dual quad core Xeon 2.53 GHz 5540 processors
    • 24 GB of memory
    • 2 X 500 TB SATA disks
  • 4 of the nodes have Fibre Channel HBAs
  • 12 IBM DS3400 disk subsystems with dual Fibre Channel controllers; each subsystem has 12 450 GB 15K rpm SAS disks

The cluster will run SLES 11.1, using SGE to run batch work and GPFS to provide shared filesystems to the cluster and other systems. The cluster has been installed in 2 separate racks, one rack in the main computer hall, the other in the computer hall extension.

 


  

Top of page 27.07.2010
 
   Page Details   © ECMWF   
shim shim shim