![]() |
||||||||
|
||||||||
Highly Available Servers |
||||||||
|
|
IntroductionECMWF operates two separate clusters of highly available servers. A highly available server is designed so that there is no single point of failure within the system. The HAPP cluster including the EVA 5000 Disk Subsystem The 2 separate clusters are:
The HAPP clusterThe HAPP cluster consists of 5 HP rx4640 Integrity Servers, each with 4 1.5 GHz Itanium2 CPUs 4 GB of memory and 2 73 GB system disks. An HP EVA 5000 Fibre Channel disk subsystem connected to each server via HP Storageworks SAN FibreChannel switches. The disk subsystem provides about 3 TB of disk space. A seperate server with 2 Itanium2 CPUs is used for software development and testing. The HAPP cluster is used to acquire and process observation data. The HANFS clusterThe HANFS cluster is very similar to the HAPP cluster, consisting of 2 rx4640 Integrity Servers, each with 4 1.5 GHz Itanium2 CPUs, 4 GB of memory, 2 73 GB system disks and an HP EVA 3000 Fibre Channel disk subsystem accessible by each server. The EVA subsystem is connected to the servers via Fibre Channel switches. The HANFS cluster is primarily a file server - all ECMWF users HOME filesystems are served from this system; the system also serves a number of other filesystems used on desktops and other systems. Disk Subsystems used by the clustersAs described above, each cluster has its own disk subsystem. In addition, in order to improve resiliency, both clusters are connected to an HP EVA 4100 disk subsystem which contains sufficient disk capacity to mirror the contents of both the EVA 3000 and the EVA 5000. This also allows any one of the 3 disk subsystems to be taken out of service in order to perform maintenance, for example upgrading the firmware that runs on the disk subsystems. System SoftwareAll the systems in both clusters run HP-UX 11.23. In order to provide high availability, both clusters use "ServiceGuard", a software package from HP. This allows services to be configured as packages. If a system failure occurs, the packages running on the system that has failed are restarted automatically on another system, so that users will only notice a short interruption to the services, typically no more than a few minutes. The mirroring of the data from the EVA 3000 and EVA 5000 to the EVA 4100 is achieved using LVM logical volume mirroring, which is part of HP-UX.
|
|||||||
|
|
|||||||