IBM Platform HPC Version 4.2
IBM Platform Platform HPC, Ve Versi rsion on 4.2 Installation Guide
SC27-6107-02
IBM Platform HPC Version 4.2
IBM Platform Platform HPC, Ve Versi rsion on 4.2 Installation Guide
SC27-6107-02
Note Before using this information and the product it supports, read the information in “Notices” in “Notices” on page 71.
First edition
This edition applies to version 4, release 2, of IBM Platform HPC (product number 5725-K71) and to all subsequent releases relea ses and modific modifications ations until otherwise indicated in new editions. © Copyr Copyright ight IBM Corpo Corporation ration 1994, 2014. US Government Users Restricted Rights – Use, duplication duplication or disclos disclosure ure restricted restricted by GSA ADP Schedule Schedule Contr Contract act with IBM Corp.
Contents Chapter 1. Installation planning Preinstallation roadmap Installation roadmap. .
. .
. .
. .
. .
. .
. . . . 1 . .
. .
. .
. .
. 2 . 3
Chapter 2. Planning . . . . . . . . . . 5 Planning your system configuration . . Planning a high availability environment .
. .
. .
. .
. 5 . 7
Chapter 3. Preparing to install PHPC . . 9 PHPC requirements . . . . . . . . . . . High availability requirements . . . . . . Prepare a shared file system . . . . . . Configure and test switches . . . . . . . . Plan your network configuration . . . . . . Installing and configuring the operating system on the management node . . . . . . . . . . Red Hat Enterprise Linux prerequisites . . . SUSE Linux Enterprise Server (SLES) 11.x prerequisites . . . . . . . . . . . .
. . . .
. 13 . 15 . 16
Chapter 4. Performing an installation Comparing installation methods Quick installation roadmap . . Quick installation . . . . . Custom installation roadmap . Custom installation . . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. 9 10 12 12 13
17 . . . . .
. . . . .
17 19 20 23 24
Chapter 5. Performing a silent installation . . . . . . . . . . . . . 29 Response file for silent installation.
.
.
.
.
.
. 29
Completing the high availability enablement . Configure IPMI as a fencing device . . . Create a failover notification. . . . . . Setting up SMTP mail settings . . . . Verifying a high availability environment . . Troubleshooting a high availability environment enablement . . . . . . . . . . . .
. . . . .
. . . . .
44 44 45 45 46
.
. 47
Chapter 10. Upgrading IBM Platform HPC . . . . . . . . . . . . . . . . 49 Upgrading to Platform HPC Version 4.2 . . . . . Upgrade planning . . . . . . . . . . . Upgrading checklist . . . . . . . . . Upgrading roadmap . . . . . . . . . Upgrading to Platform HPC 4.2 without OS reinstall. . . . . . . . . . . . . . . Preparing to upgrade . . . . . . . . . Backing up Platform HPC . . . . . . . Performing the Platform HPC upgrade . . . Completing the upgrade . . . . . . . . Verifying the upgrade . . . . . . . . . Upgrading to Platform HPC 4.2 with OS reinstall Preparing to upgrade . . . . . . . . . Backing up Platform HPC . . . . . . . Performing the Platform HPC upgrade . . . Completing the upgrade . . . . . . . . Verifying the upgrade . . . . . . . . . Troubleshooting upgrade problems . . . . . Rollback to Platform HPC 4.1.1.1 . . . . . . Upgrading entitlement. . . . . . . . . . . Upgrading LSF entitlement . . . . . . . . Upgrading PAC entitlement . . . . . . . .
49 49 49 50 50 50 51 52 53 55 55 55 57 57 58 60 60 61 63 63 63
Chapter 6. Verifying the installation . . 35 Chapter 11. Applying fixes . . . . . . 65 Chapter 7. Taking the first steps after installation . . . . . . . . . . . . . 37 Chapter 8. Troubleshooting installation problems . . . . . . . . . . . . . . 39 Configuring your browser
.
.
.
.
.
.
.
.
. 40
Chapter 9. Setting up a high availability environment . . . . . . . . . . . . 41 Preparing high availability . . . . . Enable a high availability environment .
© Copyright IBM Corp. 1994, 2014
. .
. .
. .
. 41 . 43
Chapter 12. References Configuration files . . . . High availability definition Commands . . . . . . pcmhatool . . . . . .
Notices
. . . . . . . 67 . . file . . . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
67 67 68 68
. . . . . . . . . . . . . . 71
Trademarks . . . . . . Privacy policy considerations
. .
. .
. .
. .
. .
. .
. .
. 73 . 73
iii
iv
Installing IBM Platform HPC Version 4.2
Chapter 1. Installation planning Installing and configuring IBM® Platform HPC involves several steps that you must complete in the appropriate sequence. Review the preinstallation and installation roadmaps before you begin the installation process. The Installation Guide contains information to help you prepare for your Platform HPC installation, and includes steps for installing Platform HPC. As part of the IBM Platform HPC installation, the following components are installed: IBM Platform LSF® IBM Platform MPI v
v
Workload management with IBM Platform LSF IBM Platform HPC includes a workload management component for load balancing and resource allocation. Platform HPC includes a Platform LSF workload management component. IBM Platform LSF is an enterprise-class software that distributes work across existing heterogeneous IT resources creating a shared, scalable, and fault-tolerant infrastructure, delivering faster, more reliable workload performance. LSF balances load and allocates resources, while providing access to those resources. LSF provides a resource management framework that takes your job requirements, finds the best resources to run the job, and monitors its progress. Jobs always run according to host load and site policies. This LSF workload management component is installed as part of Platform HPC installation, and the workload management master daemon to be configured running on the node same as the Platform HPC management node. For more information on IBM Platform LSF, refer to the IBM Platform LSF Administration guide. You can find the IBM Platform LSF here: http:// public-IP-address/install/kits/kit-phpc-4.2/docs/lsf/, where public-IP-address is the public IP address of your Platform HPC management node To upgrade your product entitlement for LSF refer to “Upgrading LSF entitlement” on page 63.
IBM Platform MPI By default, IBM Platform MPI is installed with IBM Platform HPC. For building MPI applications, you must have one of the supported compilers installed. Refer to the IBM Platform MPI release notes for a list of supported compilers. The IBM Platform MPI release notes are in the /opt/ibm/platform_mpi/doc/ directory. For more information on submitting and compiling MPI jobs, see the IBM Platform MPI User's Guide 9.1 (SC27-5319-00).
© Copyright IBM Corp. 1994, 2014
1
Preinstallation roadmap Before you begin your installation, ensure that the preinstallation tasks are completed. There are two cases to consider before installing Platform HPC, including: Installing Platform HPC on a bare metal management node. node. Installing Instal ling Platform Platform HPC on a manag managemen ementt node that already already has an operating operating system installed. v
v
If you are installing Platform HPC on a management node that already has an operatin ope rating g syste system m that is instal installed, led, you can omit pre preinsta installati llation on actions 4 and 5. Table 1. Preinstallation roadmap
1.
Actions
Description
Plan your cluster
Review and plan your cluster setup. Refer to “Planning to “Planning your system configuratio config uration” n” on page 5.
1.
Revie Re view w Pl Plat atfo form rm HP HPC C re requ quir ireme ement ntss
Make Ma ke su sure re th that at th thee min minimu imum m ha hard rdwar waree requirements are met, including: v
Hardware requirements
v
Software requirements
Refer to “PHPC to “PHPC requirements” on page 9. 2.
Con onfi figu gurre an and d tes estt sw swit itch chees
Ensurre th Ensu that at th thee nec eces essa sary ry sw swit itch ches es ar aree configured to work with Platform HPC. Refer to “Configure to “Configure and test switches” on page 12.
3.
Plan Pl an yo your ur ne netw twor ork k co conf nfig igur urat atio ion n
Before Befo re pr proc ocee eedi ding ng wi with th th thee in inst stal alla lati tion on,, plan your network configuration, including: v
Provision network information
v
Public network information
v
BMC network informatio information n
Refer to to “Plan your network configuratio config uration” n” on page 13. 4.
Obtain Obt ain a copy copy of you yourr oper operati ating ng sys system tem
If the ope operat rating ing sys system tem is not ins instal talled led,, you must obtain a copy of your operating system and install it.
5.
Installl and Instal and confi configur guree your your ope operat rating ing system
Ensure that you configure your operating system: v
v
Decide on a partitioning partitioning layout Meet the Red Hat Enterprise Enterprise Linux 6.x prerequisites
Refer to to “Installing and configuring the operating system on the management node” on page 13. 6.
2
Obta Ob tain in a co copy py of IB IBM M Pl Plat atfo form rm HP HPC C
Installing IBM Platform HPC Version 4.2
If yo you u do no nott ha have ve a co copy py of IB IBM M Platform HPC, you can download it from IBM Passport Advantage®.
Installation roadmap This roadmap helps you navigate your way through the PHPC installation. Table 2. Installation roadmap
1.
Actions
Description
Sele Se lect ct an in inst stal alla lati tion on me mettho hod d
Choose Choo se an in inst stal alla lati tion on me mettho hod d fr from om th thee following: v
Installing PHPC using the installer. installer. Using the installer you have the following choices: – Quick insta installation llation – Custom instal installation lation
v
Installing Insta lling PHPC using silent silent mode
Refer to Chapter to Chapter 1, “Installation planning,” on page 1. 2.
Perform the installation
Follow your installation method to complete the PHPC installation.
3.
Verify your installation
Ensure that PHPC is successfully installed. Refer to Chapter to Chapter 6, “Verifying the installation instal lation,” ,” on page 35.
4.
Trou rouble blesho shooti oting ng pro proble blems ms tha thatt occu occurre rred d during installation
If an error occurs during installation, you can troubleshoot the error. Refer to Chapter to Chapter 8, “Troubleshooting installation problems,” on page 39.
5.
(Option (Opt ional) al) Up Upgr grad adin ing g pr prod oduct uct entitlement
Optionally, you can update your product entitlement for LSF. Refer to to “Upgrading LSF entitlement” on page 63.
6.
(Opt (O ptio ion nal al)) Ap Appl ply y PH PHPC PC fi fixe xess
Aft fter er yo you u in inst stal alll PH PHPC PC,, yo you u ca can n ch chec eck k if there are any fixes available though the IBM Fix Central. Refer to Chapter to Chapter 11, “Applying fixes,” on page 65.
Chapter 1. Installation planning
3
4
Installing IBM Platform HPC Version 4.2
Chapter 2. Planning Before you install IBM Platform HPC and deploy system, you must decide on your network topology, and system configuration.
Planning your system configuration Understand the role of the management node and plan your system settings and configurations accordingly. IBM Platform HPC software is installed on the management node after the management node meets all requirements. The management node is responsible for the following functions: Administration, management, and monitoring monitoring of the system Installation of compute compute nodes Operating system distribution distribution management and updates System configuration configuration management Kit management Provisioning templates Stateless and stateful stateful management User logon, compilation, compilation, and submission of jobs to the system Acting Actin g as a fir firewal ewalll to shiel shield d the system from external external nodes and netwo networks rks Acting as a server for many important services, such as DHCP DHCP,, NFS, DNS, NTP, NTP, HTTP v
v
v
v
v
v
v
v
v
v
The management node connects to both a provision and public network. Below, the management node connects to the provision network through the Ethernet interface that is mapped to eth1 eth1.. It connects to the public network through the eth0.. The public network refers to the main Ethernet Ether net interface interface that is mappe mapped d to to eth0 network in your company or organization. A network switch connects the installation and compute nodes together to form a provision network.
© Copyri Copyright ght IBM Corp. 1994, 2014
5
Figure 1. System with a BMC network
Each compute node can be connected to the provision network and the BMC network. Multiple compute nodes are responsible for calculations. They are also responsible for running batch or parallel jobs. For networks where compute nodes have the same port for an Ethernet and BMC connection, the provision and BMC network can be the same. Below, is an example of a system where compute nodes share a provisioning port.
6
Installing IBM Platform HPC Version 4.2
Figure 2. System with a combined provision and BMC network
Note: For IPMI using a BMC network, you must use eth0 in order for the BMC network to use the provision network.
Although other system configurations are possible, the two Ethernet interface configurations is the most common. By default, eth0 is connected to the provision interface and eth1 is connected to the public interface. Alternatively, eth0 can be the public interface and eth1 the provision interface. Note: You can also connect compute nodes to an InfiniBand network after the installation.
The provision network connects the management node and compute nodes is typically a Gigabit or 100-Mbps Ethernet network. In this simple setup, the provision network serves three purposes: System administration System monitoring Message passing v
v
v
It is common practice, however, to perform message passing over a much faster network using a high-speed interconnect such as InfiniBand. A fast interconnect provides benefits such as higher throughput and lower latency. For more information about a particular interconnect, contact the appropriate interconnect vendor.
Planning a high availability environment A high availability environment includes two installed PHPC management nodes locally with same software and network configuration (except the hostname and IP address). High availability is configured on both management nodes to control key services.
Chapter 2. Planning
7
8
Installing IBM Platform HPC Version 4.2
Chapter 3. Preparing to install PHPC Before installing PHPC, steps must be taken to ensure all prerequisite are met. Before installing PHPC, you must complete the following steps: Check the PHPC requirements. You must make sure that the minimum hardware and software requirements are met. Configure and test switches. Plan network configuration. Obtain a copy of the operating system. Refer to the PHPC requirements for a list of supported operating systems. Install an operating system for the management node. Obtain a copy of the product. v
v
v
v
v
v
PHPC requirements You must make sure that the minimum hardware and software requirements are met.
Hardware requirements Before you install PHPC, you must make sure that minimum hardware requirements are met. Minimum hardware requirements for the management node: 100 GB free disk space 4 GB of physical memory (RAM) At least one static Ethernet configured interface v
v
v
Note: For IBM PureFlex ™ systems, the management node must be a node that is not in the IBM Flex Chassis.
Minimum requirements for compute node for stateful package-based installations: 1 GB of physical memory (RAM) for compute nodes 40 GB of free disk space One static Ethernet interface v
v
v
Minimum requirements for compute node for stateless image-based installations: 4 GB of physical memory (RAM) One static Ethernet interface v
v
Optional hardware can be configured before the installation: Additional Ethernet interfaces for connecting to other networks Additional BMC interfaces Additional interconnects for high-performance message passing, such as InfiniBand v
v
v
Note: Platform HPC installation on an NFS server is not supported.
© Copyright IBM Corp. 1994, 2014
9
Software requirements One of the following operating systems is required: Red Hat Enterprise Linux (RHEL) 6.5 x86 (64-bit) SUSE Linux Enterprise Server (SLES) 11.3 x86 (64-bit) v
v
High availability requirements You must make sure that these requirements are met before you set up high availability.
Management node requirements Requirements for the primary management node and the secondary management node in a high availability environment: The management nodes must have the same or similar hardware requirements. The management nodes must have the same partition layout. After you prepare the secondary management node, you can ensure that the secondary node uses the same partition schema as the primary management node. Use df -h and fdisk -l to check the partition layout. If the secondary node has a different partition layout, reinstall the operating system with the same partition layout. The management nodes must use the same network settings. The management nodes must use the same network interface to connect to the provision and public networks. Ensure that the same network interfaces are defined for the primary and secondary management nodes. On each management node, issue the ifconfig command to check that the network settings are the same. Additionally, ensure that the IP address of same network interface is in the same subnet. If not, reconfigure the network interfaces on the secondary management node according to your network plan. The management nodes must be configured with the same time, time zone, and current date. v
v
v
v
v
Virtual network requirements Virtual network information is needed to configure and enable high availability. Collect the following high availability information: Virtual management node name Virtual IP address for public network Virtual IP address for provision network Shared directory for user home Shared directory for system work data v
v
v
v
v
Note: In a high availability environment, all IP addresses (management nodes IP addresses and virtual IP address) are in the IP address range of your network. To ensure that all IP addresses are in the IP address range of your network, you can use sequential IP addresses. Sequential IP addresses can help avoid any issues. For example:
10
Installing IBM Platform HPC Version 4.2
Table 3. Example: Sequential IP addresses
Primary management IP address range node
Secondary management node
Virtual IP address
public
192.168.0.3192.168.0.200
192.168.0.3
192.168.0.4
192.168.0.5
provision
172.20.7.3172.20.7.200
172.20.7.3
172.20.7.4
172.20.7.5
Network
Shared file system requirements Shared file systems are required to set up a high availability environment in Platform HPC. By default, two shared directories are required in a high availability environment; one to store user data and one to store system work data. In a high availability environment, all shared file systems must be accessible by the provision network for both the management nodes and compute nodes. The following shared file systems must already be created on your shared storage server before you set up and enable a high availability environment: Shared directory for system work data v
v
The minimum available shared disk space that is required is 40 GB. Required disk space varies based on the cluster usage. The read, write, and execute permissions must be enabled for the operating system root user and the Platform HPC administrator. By default, the Platform HPC administrator is phpcadmin.
Shared directory for user data ( /home) v
v
Ensure that there is enough disk space for your data in your /home directory. The minimum available shared disk space that is required is 4 GB, and it varies based on the disk space requirements for each user and the total user number. If not provided, the user data is stored together with system work data. The read and write permissions must be enabled for all users.
Additionally, the following shared file system requirements must be met: The shared file systems cannot be one of the management nodes. The shared file systems should be specific to and only use for the high availability environment. This ensures that no single point of failure (SPOF) errors occur. If the IP address of the shared storage server is in the network IP address range that is managed by Platform HPC, it must be added as an unmanaged device to the cluster to avoid any IP address errors. Refer to Unmanaged devices. If using an external NAS or NFS server to host the shared directories that are needed for high availability, the following parameters must be specified in the exports entries: rw,sync,no_root_squash,fsid=num v
v
v
v
where num is an integer and should be different for each shared directory. For example, to create a shared data and a shared home directory on an external NFS server, use the following commands:
Chapter 3. Preparing to install
11
mkdir -p /export/data mkdir -p /export/home
Next, modify the /etc/exports file on the external NFS server. /export/ 172.20.7.0/24(rw,sync,no_root_squash,fsid=0) Note: If you are using two different file systems to create the directories, ensure that the fsid parameter is set for each export entry. For example: /export/data 172.20.7.0/24(rw,sync,no_root_squash,fsid=3) /export/home 172.20.7.0/24(rw,sync,no_root_squash,fsid=4)
Prepare a shared file system Before you enable high availability, prepare a shared file system. A shared file system is used in high availability to store shared work and user settings.
Procedure 1. Confirm that the NFS server can be used for the high availability configuration and that it is accessible from the Platform HPC management nodes. Run the following command on both management nodes to ping the NFS server from provision network. # ping -c 2 -I eth1 192.168.1.1 PING 192.168.1.1 (192.168.1.1) from 192.168.1.3 eth1: 56(84) bytes of data. 64 bytes from 192.168.1.1: icmp_seq=1 ttl=64 time=0.051 ms 64 bytes from 192.168.1.1: icmp_seq=2 ttl=64 time=0.036 ms 2. View the list of all NFS shared directories available on the NFS server. # showmount -e 192.168.1.1 Export list for 192.168.1.1: /export/data 192.168.1.0/255.255.255.0 /export/home 192.168.1.0/255.255.255.0 3. Add the NFS server as an IP pool to the Platform HPC system. This prevents the IP address of the NFS server from being allocated to a compute node and ensures that the NFS server name can be resolved consistently across the cluster. On the primary management node, run the following commands. #nodeaddunmged hostname=nfsserver ip=192.168.1.1 Created unmanaged node. #plcclient.sh -p pcmnodeloader Loaders startup successfully.
Configure and test switches Before installing IBM Platform HPC ensure that your Ethernet switches are configured properly. Some installation issues can be caused by misconfigured network switches. These issues include: nodes that cannot PXE boot, nodes cannot download a kickstart file, nodes cannot go into interactive startup. To ensure that the Ethernet switches are configured correctly, complete the following steps: 1. Disable the Spanning Tree on switched networks. 2. If currently disabled, enable PortFast on the switch. Different switch manufacturers may use different names for Portfast. It is the forwarding scheme that the switch uses. For best installation performance, the switch begins forwarding the packets as it begins receiving them. This speeds the PXE booting process. Enabling PortFast if it is supported by the switch is recommended.
12
Installing IBM Platform HPC Version 4.2
3. If currently disabled, enable multicasting on the switch. Certain switches might
need to be configured to allow multicast traffic on the private network. 4. Run diagnostics on the switch to ensure that the switch is connected properly, and there are no bad ports or cables in the configuration.
Plan your network configuration Before installing Platform HPC, ensure that you know the details of your network configuration, including if you are setting up a high availability environment. Information about your network is required during installation, including information about the management nodes, and network details. Note: If you are setting up a high availability environment, collect the information for both management nodes, the primary management node and the secondary management node.
The following information is needed to setup and configure your network. Plan your network details, including: Provision network information: – Network subnet – Network domain name – Static IP address range Public network information: – Network subnet – Network domain name – Static IP address range BMC network information: – Network subnet – Network domain name – Static IP address range Management node information: – Node name (use a fully qualified domain name with a public domain suffix, for example: management.domain.com ) – Static IP address and subnet mask for public network – Static IP address and subnet mask for provision network – Default gateway address – External DNS server IP address v
v
v
v
Note: For a high availability environment, the management node information is required for both the primary management node and the secondary management node.
Installing and configuring the operating system on the management node Before you can create the PHPC management node, you must install an operating system on the management node.
Chapter 3. Preparing to install
13
Complete the following steps to install the operating system on the management node: 1. Obtain a copy of the operating system. 2. Install and configure the operating system. Before you install the operating system on the management node, ensure that the following conditions are met: Decide on a partitioning layout. The suggested partitioning layout is as follows: – Ensure that the /opt partition has a least 4 GB – Ensure that the /var partition has at least 40 GB – Ensure that the /install partition has at least 40 GB v
Note: After you install Platform HPC, you can customize the disk partitioning on compute nodes by creating a custom script to configure Logical Volume Manager (LVM) partitioning. v
v
v
v
v
v
Configure at least one static network interface. Use a fully qualified domain name (FQDN) for the management node. The /home directory must be writable. If the /home directory is mounted by autofs, you must first disable the autofs configuration: # chkconfig autofs off # service autofs stop To make the /home directory writable, run the following command as root: # chmod u+w /home # ls -al / |grep home The package openais-devel must be removed manually if it is already installed. Before you install PHPC on the management node, make sure that shadow passwords authentication is enabled. Run setup and make sure Use Shadow Passwords is checked. Ensure that IPv6 is enabled for remote power and console management. Do not disable IPv6 during the operating system installation. To enable IPv6 do the following: For RHEL: If the disable-ipv6.conf file exists in the /etc/modprobe.d directory, comment out the following line to disable IPv6: install ipv6 /bin/true For SLES: If 50-ipv6.conf file exists in the /etc/modprobe.d directory, comment out the following line to disable IPv6: #install ipv6 /bin/true
Note: After you install the operating system, ensure that the operating system time is set to the current real time. Use the date command to check the date on the operating system, and date -s command to set the date. For example: date -s
"20131017 04:57:00" Important:
The management node does not support installing on an operating system that is upgraded through yum or zypper update. Do not run a yum update (RHEL) or zypper update (SLES) before installing PHPC. You can update the management node's operating system after installation. If you do upgrade your operating system through yum or zypper then you must roll back your changes before proceeding with the PHPC installation.
14
Installing IBM Platform HPC Version 4.2
If you are installing the Red Hat Enterprise Linux (RHEL) 6.x operating system, see the additional RHEL prerequisites. If you are installing the SUSE Linux Enterprise Server (SLES) 11.x operating system, see the additional SLES prerequisites. After all the conditions and prerequisites are met, install the operating system. Refer to the operating system documentation for how to install the operating system.
Red Hat Enterprise Linux prerequisites Before you install Platform HPC on Red Hat Enterprise Linux (RHEL) 6.x, you must ensure the following: 1. The 70-persistent-net.rules file is created under /etc/udev/rules.d/ to make the names persistent across reboots. 2. Before installing PHPC, you must stop the NetworkManager service. To stop the NetworkManager service, run the following command: /etc/init.d/NetworkManager stop 3. Disable SELinux. a. On the management node, edit the /etc/selinux/config file to set SELINUX=disabled. b. Reboot the management node. 4. Ensure that the traditional naming scheme eth N is used. If you have a system that does not use the traditional naming scheme eth N, you must revert to the traditional naming scheme eth N: a. Rename all ifcfg-emN and ifcfg-p* configuration files and modify the contents of the files accordingly. The content of these files is distribution-specific (see /usr/share/doc/initscripts- version for details). For example, ifcfg-ethN files in RHEL 6.x contain a DEVICE= field which is assigned with the emN name. Modify it to suit the new naming scheme such as DEVICE=eth0 . b. Comment the HWADDR variable in the ifcfg-eth* files if present as it is not possible to predict here which of the network devices is named eth0, eth1 etc. c. Reboot the system. d. Log in to see the ethN names. 1. Check whether the package net-snmp-perl is installed on the management node. If not, you must install it manually from the second RHEL 7 on POWER ISO. 2. Before installing PHPC, you must stop the NetworkManager service. To stop the NetworkManager service, run the following command: /etc/init.d/NetworkManager stop 3. Disable SELinux. a. On the management node, edit the /etc/selinux/config file to set SELINUX=disabled. b. Reboot the management node.
Chapter 3. Preparing to install
15
SUSE Linux Enterprise Server (SLES) 11.x prerequisites Before you install Platform HPC on SUSE Linux Enterprise Server (SLES), you must complete the following steps. 1. You must disable AppArmor. To disable AppArmor, complete the following steps: a. Start the YaST configuration and setup tool. b. From the System menu, select the System Services (Runlevel) option. c. Select the Expert Mode option. d. Select the boot.apparmor service, go to the Set/Reset menu and select Disable the service . e. To save the options click OK . f. Exit the YaST configuration and setup tool by clicking OK . 2. If the createrepo and perl-DBD-Pg packages are not installed, complete the following steps: a. To install the packages, prepare the following ISO images: Installation ISO image: SLES-11-SP3-DVD-x86_64-GM-DVD1.iso SDK ISO image: SLE-11-SP3-SDK-DVD-x86_64-GM-DVD1.iso b. Create a software repository for each ISO image using the YaST configuration and setup tool. You must create a software repository for both the installation ISO image and the SDK ISO image. To create a software repository, complete the following steps: 1) Start the YaST configuration and setup tool in a terminal. 2) From the Software menu, select the Software Repositories option and click Add . 3) Select the Local ISO Image option and click Next. 4) Enter the Repository Name and select a Path to ISO Image . Click Next. 5) Click OK to save the options and exit the YaST configuration and setup tool. c. Install the createrepo and perl-DBD-Pg packages, run the following command: zypper install createrepo perl-DBD-Pg 3. Reboot the management node. v
v
16
Installing IBM Platform HPC Version 4.2
Chapter 4. Performing an installation Install PHPC using the installer. The installer enables you to specify your installation options. After the installation starts, the installer automatically checks the hardware and software configurations. The installer displays the following based on the results: OK - if no problems are found for the checked item WARNING - if configuration of an item does not match the requirements; installation continues despite the warnings FAILED - if the installer cannot recover from an error, the installation quits v
v
v
The installer ( phpc-installer) displays the corresponding error message for problems that are detected and automatically ends the installation. If there are errors, you must resolve the identified problems then rerun the phpc-installer until all installation requirements are met.
Usage notes v
v
v
Do not use an NFS partition or a local /home partition for the depot ( /install) mount point. In the quick installation, the default values are used for values not specified during installation. A valid installation path for the installer must be used. The installation path cannot include special characters such as a colon (:), exclamation point (!) or space, and the installation cannot begin until a valid path is used.
Comparing installation methods IBM Platform HPC can be installed using an interactive installer in one of two methods, the quick installation method and the custom installation method. The quick installation method sets up quickly sets up basic options with default options. The custom installation method provides added installation options and enables the administrator to specify additional system configurations. Below is a complete comparison table of the two installation methods and the default values provided by the installer. Table 4. Installer option comparison
Options
Default value
Option included in the Quick installation? (Yes/No)
Select a mount point for the depot (/install) directory.
/
Yes
Yes
Select the location that you want CD/DVD drive to install the operating system from.
Yes
Yes
Specify a provision network interface.
Yes
Yes
© Copyright IBM Corp. 1994, 2014
eth0
Option included in the Custom installation? (Yes/No)
17
Table 4. Installer option comparison (continued)
Options
Default value
Option included in the Quick installation? (Yes/No)
Specify a public network interface.
eth1
Yes
Yes
Do you want to enable a public network connection?
Yes
Yes
Yes
Do you want to enable the public interface firewall?
Yes
No
Yes
Do you want to enable NAT forwarding on the management node?
Yes
No
Yes
Enable a BMC network that uses No the default provisioning template?
Yes
Yes
Select a BMC network, options include:
Create a new network
Yes
Yes
If creating a new BMC network, specify a subnet for the BMC network
N/A
Yes
Yes
If creating a new BMC network, specify a subnet mask for the BMC network
255.255.255.0
Yes
Yes
If creating a new BMC network, N/A specify a gateway IP address for the BMC network
No
Yes
If creating a new BMC network, specify an IP address range for the BMC network
192.168.1.3-192.168.1.254
Yes
Yes
Specify the hardware profile used by your BMC network. Hardware profile, options include:
IBM_System_x_M4
Yes
Yes
Set the domain name for the provision network.
private.dns.zone
Yes
Yes
Set the domain name for the public network.
public.com
Yes
Yes
v
18
Option included in the Custom installation? (Yes/No)
Create a new network
v
Public network
v
Provision Network
v
IPMI
v
IBM_Flex_System_x
v
IBM_System_x_M4
v
IBM_iDataPlex_M4
v
IBM_NeXtScale_M4
Installing IBM Platform HPC Version 4.2
Table 4. Installer option comparison (continued)
Options
Default value
Option included in the Quick installation? (Yes/No)
Specify the provisioning compute node IP address range. This is generated based on management node interface.
10.10.0.3-10.10.0.200
No
Yes
Do you want to provisioning compute nodes with the node discovery method?
Yes
No
Yes
Specify the node discovery IP address range. This is generated based on management node interface.
10.10.0.201-10.10.0.254
No
Yes
Set the IP addresses of the name 192.168.1.40,192.168.1.50 servers.
No
Yes
No
Yes
Do you want to export the /home Yes directory?
No
Yes
Set the database administrator password.
No
Yes
No
Yes
Specify the NTP server.
pool.ntp.org
pcmdbpass
Set the default root password for PASSW0RD compute nodes.
Option included in the Custom installation? (Yes/No)
Quick installation roadmap Before you begin your quick installation, use the following roadmap to prepare your values for each installation option. You can choose to use the default example values for some or all of the options. Table 5. Preparing for PHPC quick installation
Option
Example values
1.
Select a mount point for the depot (/install) directory.
/
2.
Select the location that you CD/DVD drive want to install the operating system from.
3.
Specify a provision network interface.
eth0
4.
Specify a public network interface.
eth1
5.
Enable a BMC network that uses the default provisioning template?
Yes
Your values
Chapter 4. Performing an installation
19
Table 5. Preparing for PHPC quick installation (continued)
6.
Option
Example values
Select a BMC network, options include:
Create a new network
v
Your values
Create a new network
v
Public network
v
Provision Network
7.
If creating a new BMC network, specify a subnet for the BMC network.
192.168.1.0
8.
If creating a new BMC network, specify a subnet mask for the BMC network.
255.255.255.0
9.
Specify the hardware IBM_System_x_M4 profile used by your BMC network. Hardware profile, options include: v
IPMI
v
IBM_Flex_System_x
v
IBM_System_x_M4
v
IBM_iDataPlex_M4
v
IBM_NeXtScale_M4
10. Set the provision network domain name.
private.dns.zone
11.
Yes
Set a domain name for the public network? (Yes/No)
12. Set the public domain name.
public.com or FQDN
Quick installation You can configure the management node by using the quick installation option.
Before you begin PHPC installation supports the Bash shell only. Before you start the PHPC installation, you must boot into the base kernel. The Xen kernel is not supported. User accounts that are created before PHPC is installed are automatically synchronized across compute nodes during node provisioning. User accounts that are created after PHPC is installed are automatically synchronized across compute nodes when the compute nodes are updated. You must be a root user to install. Installing PHPC requires you to provide the OS media. If you want to use the DVD drive, ensure that no applications are actively using the drive (including any command shell). If you started the PHPC installation in the DVD directory, you can suspend the installation ( Ctrl-z), change to another directory ( cd ~), and then resume the installation (fg). Alternately, you can start the installation from another directory (for example: cd ~; python mount_point/phpcinstaller). v
v
v
v
20
Installing IBM Platform HPC Version 4.2
v
The /home mount point must have writable permission. Ensure that you have the correct permissions to add new users the /home mount point.
About this task The installer completes pre-checking processes and prompts you to answer questions to complete the management node configuration. The following steps summarize the installation of PHPC on your management node: 1. License Agreement 2. Management node pre-check 3. Specify installation settings 4. Installation Complete the following installation steps:
Procedure 1. Choose one of the following installation methods:
Download the PHPC ISO to the management node. Insert the PHPC DVD into the management node. 2. Mount the PHPC installation media: If you install PHPC from ISO file, mount the ISO into a directory such as /mnt. For example: # mount -o loop phpc-4.2.x64.iso /mnt If you install PHPC from DVD media, mount to a directory such as /mnt. v
v
v
v
Tip: Normally, the DVD media is automatically mounted to /media/PHPC- program_number. To start the installer, run: /media/PHPC program_number/phpc-installer. If the DVD is mounted without execute permission, you must add python in front of the command ( python /media/PHPC- program_number/phpc-installer). 3. Start the PHPC installer, issue the following command: # /mnt/phpc-installer 4. Accept the license agreement and continue. 5. Management node pre-checking automatically starts. 6. Choose the Quick Installation option as your installation method. 7. Select a mount point for the depot ( /install) directory. The depot ( /install)
directory stores installation files for PHPC. The PHPC management node checks for the required disk space. 8. Select the location that you want to install the operating system from. The operating system version that you select must be the same as the operating system version on the management node. OS Distribution installation from the DVD drive: Insert the correct OS DVD disk into the DVD drive. The disk is verified and added to the depot ( /install) directory after you confirm the installation. If the PHPC disk is already inserted, make sure to insert the OS disk after you copy the PHPC core packages. OS Distribution installation from an ISO image or mount point: Enter the path for the OS Distribution or mount point, for example:/iso/rhel/6.x/x86_64/rhel-server-6.x-x86_64-dvd.iso The PHPC management node verifies that the operating system is a supported distribution, architecture, and version. Chapter 4. Performing an installation
21
Note: If the OS distribution is found on more than one ISO image, use the first ISO image during the installation. After the PHPC installation is completed, you can add the next ISO image from the Web Portal.
9. 10.
11.
If you choose to install from an ISO image or mount point, you must enter the ISO image or mount point path. Select a network interface for the provisioning network. Select how the management node is connected to the public network. If the management node is not connected to the public network, select: It is not connected to the public network . Enable a BMC network that uses the default provisioning template. If you choose to enable a BMC network, you must specify the following options: a. Select a BMC network. Options include: Public network Provision Network Create a new network. If you create a new BMC network, specify the following options: – A subnet for the BMC network. – A subnet mask for the BMC network. b. Select hardware profile for the BMC network. Enter a domain name for the provisioning network. Set a domain name for the public network. Enter a domain name for the public network. A summary of your selected installation settings is displayed. To change any of these settings, press ‘99’ to reselect the settings or press '1' to begin the installation. v
v
v
12. 13. 14. 15.
Results You successfully completed the PHPC installation. You can find the installation log here: /opt/pcm/log/phpc-installer.log . To configure PHPC environment variables, run the following command: source /opt/pcm/bin/pcmenv.sh. Configuration is not required for new login sessions.
What to do next After you complete the installation, verify that your PHPC environment is setup correctly. To get started with PHPC, using your web browser, you can access the Web Portal at http://hostname:8080 or http://IPaddress:8080. Log in with the user account root and default password Cluster on the management node.
22
Installing IBM Platform HPC Version 4.2
Custom installation roadmap Before you begin your custom installation, use the following roadmap to prepare your values for each installation option. You can choose to use the default example values for some or all of the options. Table 6. Preparing for PHPC custom installation
Options
Example values
1.
Select a mount point for the depot (/install) directory.
/
2.
Select the location that you want to install CD/DVD drive the operating system from.
3.
Specify a provision network interface.
eth0
4.
Specify a public network interface.
eth1
5.
Do you want to enable the public interface firewall (Yes/No)
Yes
6.
Do you want to enable NAT forwarding on the management node? (Yes/No)
Yes
7.
Enable a BMC network that uses the default provisioning template?
Yes
8.
Select one of the following options for creating your BMC networks: a.
9.
Your values
Create a new network and specify the Yes following options: i. Subnet
192.168.1.0
ii. Subnet mask
255.255.255.0
iii. Gateway IP address
192.168.1.1
iv. IP address range
192.168.1.3192.168.1.254
b.
Use the public network
N/A
c.
Use the provision Network
N/A
Specify the hardware profile used by your BMC network. Hardware profile, options include: v
IPMI
v
IBM_Flex_System_x
v
IBM_System_x_M4
v
IBM_iDataPlex_M4
v
IBM_NeXtScale_M4
IBM_System_x_M4
10. Set the provision network domain name. private.dns.zone 11.
Set a domain name for the public network? (Yes/No)
12. Set the public domain name.
Yes
public.com or FQDN
13. Specify the provisioning compute node IP 10.10.0.3-10.10.0.200 address range. This is generated based on management node interface. 14. Do you want to provisioning compute nodes with the node discovery method? (Yes/No)
Yes
Chapter 4. Performing an installation
23
Table 6. Preparing for PHPC custom installation (continued)
Options
Example values
15. Specify the node discovery IP address range. This is generated based on management node interface.
10.10.0.201-10.10.0.254
16. Set the IP addresses of the name servers.
192.168.1.40,192.168.1.50
17. Specify the NTP server.
18. Do you want to export the /home directory? (Yes/No)
Your values
pool.ntp.org Yes
19. Set the database administrator password. pcmdbadm 20. Set the default root password for compute nodes.
Cluster
Custom installation You can configure the management node by using the custom installation option.
Before you begin Note: PHPC installation supports the Bash shell only. v
v
v
v
v
Before you start the PHPC installation, you must boot into the base kernel. The Xen kernel is not supported. User accounts that are created before PHPC is installed are automatically synchronized across compute nodes during node provisioning. User accounts that are created after PHPC is installed are automatically synchronized across compute nodes when the compute nodes are updated. You must be a root user to install. Installing PHPC requires you to provide the OS media. If you want to use the DVD drive, ensure that no applications are actively using the drive (including any command shell). If you started the PHPC installation in the DVD directory, you can suspend the installation ( Ctrl-z), change to another directory ( cd ~), and then resume the installation (fg). Alternately, you can start the installation from another directory (for example: cd ~; python mount_point/phpcinstaller). The /home mount point must have writable permission. Ensure that you have the correct permissions to add new users the /home mount point.
About this task The installer completes pre-checking processes and prompts you to answer questions to complete the management node configuration. The following steps summarize the installation of PHPC on your management node: 1. License Agreement 2. Management node pre-check 3. Specify installation settings 4. Installation Complete the following installation steps:
24
Installing IBM Platform HPC Version 4.2
Procedure 1. Choose one of the following installation methods:
Download the PHPC ISO to the management node. Insert the PHPC DVD into the management node. 2. Mount the PHPC installation media: If you install PHPC from ISO file, mount the ISO into a directory such as /mnt. For example: # mount -o loop phpc-4.2.x64.iso /mnt If you install PHPC from DVD media, mount to a directory such as /mnt. v
v
v
v
Tip: Normally, the DVD media is automatically mounted to /media/PHPC- program_number. To start the installer, run: /media/PHPC program_number/phpc-installer. If the DVD is mounted without execute permission, you must add python in front of the command ( python /media/PHPC- program_number/phpc-installer). 3. Start the PHPC installer, issue the following command: # /mnt/phpc-installer 4. Accept the license agreement and continue. 5. Management node pre-checking automatically starts. 6. Select the Custom Installation option. 7. Select a mount point for the depot ( /install) directory. The depot ( /install)
directory stores installation files for PHPC. The PHPC management node checks for the required disk space. 8. Select the location that you want to install the operating system from. The operating system version that you select must be the same as the operating system version on the management node. OS Distribution installation from the DVD drive: Insert the correct OS DVD disk into the DVD drive. The disk is verified and added to the depot ( /install) directory after you confirm the installation. If the PHPC disk is already inserted, make sure to insert the OS disk after you copy the PHPC core packages. OS Distribution installation from an ISO image or mount point: Enter the path for the OS Distribution or mount point, for example:/iso/rhel/6.x/x86_64/rhel-server-6.x-x86_64-dvd.iso The PHPC management node verifies that the operating system is a supported distribution, architecture, and version. Note: If the OS distribution is found on more than one ISO image, use the first ISO image during the installation. After the PHPC installation is completed, you can add the next ISO image from the Web Portal.
9. 10. 11. 12.
If you choose to install from an ISO image or mount point, you must enter the ISO image or mount point path. Select a network interface for the provisioning network. Enter the IP address range that is used for provisioning compute nodes. Choose whether to provision compute nodes automatically with the node discovery method. Enter a node discovery IP address range to be used for provisioning compute nodes by node discovery. The node discovery IP address range is a temporary IP address range that is used to automatically provision nodes by using the Chapter 4. Performing an installation
25
auto node discovery method. This range cannot overlap the range that is specified for the provisioning compute nodes. 13. Select how the management node is connected to the public network. If the management node is not connected to the public network, select: It is not connected to the public network . If your management node is connected to a public network, optionally, you can enable the following settings: a. Enable PHPC specific rules for the management node firewall that is connected to the public interface. b. Enable NAT forwarding on the management node for all compute nodes. 14. Enable a BMC network that uses the default provisioning template. If you choose to enable a BMC network, you must specify the following options: a. Select a BMC network. Public network Provision Network Create a new network. If you create a new BMC network, specify the following options: – A subnet for the BMC network. – A subnet mask for the BMC network. – A gateway IP address for the BMC network. – An IP address range for the BMC network. b. Specify a hardware profile for the BMC network. v
v
v
Table 7. Available hardware profiles based on hardware type
Hardware
Hardware profile
Any IPMI-based hardware
IPMI
IBM Flex System® x220, x240, and x440
IBM_Flex_System_x
IBM System x3550 M4, x3650 M4, x3750 M4
IBM_System_x_M4
IBM System dx360 M4
IBM_iDataPlex_M4
IBM NeXtScale nx360 M4
IBM_NeXtScale_M4
15. Enter a domain name for the provisioning network. 16. Set a domain name for the public network. 17. Enter a domain name for the public network. 18. Enter the IP addresses of your name servers that are separated by commas 19. Set the NTP server. 20. Export the home directory on the management node and use it for all
compute nodes. 21. Enter the PHPC database administrator password. 22. Enter the root account password for all compute nodes. 23. A summary of your selected installation settings is displayed. To change any of these settings, press ‘99’ to reselect the settings or press '1' to begin the installation.
What to do next After you complete the installation, verify that your PHPC environment is setup correctly.
26
Installing IBM Platform HPC Version 4.2
To get started with PHPC, using your web browser, you can access the Web Portal at http://hostname:8080 or http://IPaddress:8080. Log in with the root user account and password on the management node.
Chapter 4. Performing an installation
27
28
Installing IBM Platform HPC Version 4.2
Chapter 5. Performing a silent installation Silent installation installs IBM Platform HPC software using a silent response file. You can specify all of your installation options in the silent installation file before installation. Before you complete the installation using silent mode, complete the following actions: Install the operating system on the management node. Ensure that you have the correct permissions to add new users the /home mount point. v
v
To complete the silent installation, complete the following steps: 1. Mount the PHPC installation media: If you install PHPC from ISO file, mount the ISO into a directory such as /mnt. For example: # mount -o loop phpc-4.2.x64.iso /mnt If you install PHPC from DVD media, mount to a directory such as /mnt. v
v
Tip: Normally, the DVD media is automatically mounted to /media/PHPC- program_number. To start the installer, run: /media/PHPC program_number/phpc-installer. If the DVD is mounted without execute permission, you must add python in front of the command ( python /media/PHPC- program_number/phpc-installer). 2. Prepare the response file with installation options. The silent response file
phpc-autoinstall.conf.example is located in the /docs directory in the Platform HPC ISO. Note: If the OS distribution is found on more than one ISO image, use the first ISO image during the installation. After the PHPC installation is completed, you can add the next ISO image from the Web Portal. 3. Run the silent installation: mnt/phpc-installer -f path_to_phpc-autoinstall.conf
where mnt is your mount point and path_to_phpc-autoinstall.conf is the location of your silent install file.
Usage notes v
A valid installation path must be used. The installation path cannot include special characters such as a colon (:), exclamation point (!) or space, and the installation cannot begin until a valid path is used.
Response file for silent installation Response file for IBM Platform HPC silent installation. # IBM Platform HPC 4.2 Silent Installation Response File # The silent installation response file includes all of the options that can # be set during a Platform HPC silent installation. # ******************************************************************** # # NOTE: For any duplicated options, only the last value is used # # by the silent installation. # © Copyright IBM Corp. 1994, 2014
29
# NOTE: Configuration options cannot start with a space or tab. # # ******************************************************************** # [General] # # depot_path # # # The depot_path option sets the path of the Platform HPC depot (/install) directory. # #Usage notes: # # 1. The Platform HPC installation requires a minimum available disk space of 40 GB. # # 2. If you specify depot_path = /usr/local/pcm/, the installer places all Platform # HPC installation contents in the /usr/local/pcm/install directory and creates # a symbol link named /install that points to the /usr/local/pcm/install directory. # # 3. If you specify depot_path = /install or depot_path = /, the installer places # all Platform HPC installation content into the /install directory. # # 4. If you have an existing /install mount point, by default, the installation # program places all installation contents into the /install directory regardless # of the depot_path value. depot_path = / # # # # # #
private_cluster_domain The private_cluster_domain option sets the provisioning network’s domain name for the cluster. The domain must be a fully qualified domain name. This is a mandatory option.
private_cluster_domain = private.dns.zone # # # # # # # #
provisioning_network_interface The provisioning_network_interface option sets one network device on the Platform HPC management node to be used for provisioning compute nodes. An accepted value for this option is a valid NIC name that exists on the management node. Values must use alphanumeric characters and cannot use quotations ("). The value ’lo’ is not supported. This is a mandatory option.
provisioning_network_interface = eth0 # # public_network_interface # # The public_network_interface option sets a network device on the Platform HPC # management node that is used for accessing networks outside of the cluster. The value # must be a valid NIC name that exists on the management node. The value cannot be # the same as the value specified for the provisioning_network_interface option. # The value cannot be ’lo’ and cannot include quotations ("). # If this option is not defined, no public network interface is defined. #public_network_interface = eth1 [Media] # # os_path # # The os_path option specifies the disc, ISO, or path of the first OS distribution
30
Installing IBM Platform HPC Version 4.2
used to # install the Platform HPC node. The os_path is a mandatory option. # # The os_path option must use one of the following options: # - full path to CD-ROM device, for example: /dev/cdrom # - full path to an ISO file, for example: /root/rhel-server-6.4-x86_64-dvd.iso # - full path to a directory where an ISO is mounted, for example: /mnt/basekit # os_path = /root/rhel-server--x86_64-dvd.iso [Advanced] # NOTE: By default, advanced options use a default value if no value is specified. # # excluded_kits # # The excluded_kits option lists specific kits that do not get installed. # This is a comma-separated list. The kit name should be same with the name # defined in the kit configuration file. If this option is not defined, # by default, all kits are installed. #excluded_kits = kit1,kit2 # # static_ip_range # # The static_ip_range options sets the IP address range used for provisioning # compute nodes. If this option is not defined, by default, the value is # automatically based on the provision network. #static_ip_range = 10.10.0.3-10.10.0.200 # # # # # # # #
discovery_ip_range The discovery_ip_range option sets the IP address range that is used for provisioning compute nodes by node discovery. This IP address range cannot overlap with the IP range used for provisioning compute nodes as specified by the static_ip_range option. You can set the discovery_ip_range value to ’none’ if you do not want to use node discovery. If this option is not defined, the default value is set to none.
#discovery_ip_range = 10.10.0.201-10.10.0.254 # # enable_firewall # # The enable_firewall option enables Platform HPC specific rules for the management # node firewall to the public interface. This option is only available if the # public_network_interface is set to yes. If this option is not defined, by default, # the value is set to yes. #enable_firewall = yes # # enable_nat_forward # # The enable_nat_forward option enables NAT forwarding on the management node # for all compute nodes. This option is only available if the enable_firewall # option is set to yes. If this option is not defined, by default, # the value is set to yes. #enable_nat_forward = yes # # enable_bmcfsp # Chapter 5. Performing a silent installation
31
# The enable_bmcfsp option enables a BMC or FSP network with the default provisioning template. # This option indicates which network is associated with BMC or FSP network.This is a # mandatory option. If this option is not defined, by default, a BMC or FSP network is # not enabled. # Options include: new_network, public, provision # new_network option: Creates a new BMC or FSP network by specifyingi the following options for the # the new network # [bmcfsp_subnet] # [bmcfsp_subnet_mask] # [bmcfsp_gateway] # [bmcfsp_iprange] # will be applied to create a new network # public option: Creates a BMC or FSP network that uses the public network. # provision option: Creates a BMC or FSP network that uses the provision network. #enable_bmcfsp = new_network # # bmcfsp_subnet # # Specify the subnet for the BMC or FSP network. This value must be different than the value used by # the public and provision networks. Otherwise, the BMC or FSP network set up fails. This option is # required if enable_bmcfsp = new_network. #bmcfsp_subnet = 192.168.1.0 # # bmcfsp_subnet_mask # # Specify the subnet mask for the BMC netwrok. This option is required if enable_bmcfsp = new_network. #bmcfsp_subnet_mask = 255.255.255.0 # # bmcfsp_gateway # # Specify the gateway IP address for the BMC or FSP network.This option is available if enable_bmcfsp = new_network. #bmcfsp_gateway = 192.168.1.1 # # bmcfsp_iprange # # Specify the IP address range for the BMC or FSP network. This option is required if enable_bmcfsp = new_network #bmcfsp_iprange = 192.168.1.3-192.168.1.254 # # bmcfsp_hwprofile # # Specify a hardware profile to associate with the BMC or FSP network. This option is required if enable_bmcfsp = new_network. # # bmcfsp_hwprofile options: # For x86-based systems, the following are supported hardware profile options: # # IBM_System_x_M4: IBM System x3550 M4, x3650 M4, x3750 M4 # IBM_Flex_System_x: IBM System x220, x240, x440 # IBM_iDataPlex_M4: IBM System dx360 M4 # IPMI: Any IPMI-based hardware #
32
Installing IBM Platform HPC Version 4.2
# For POWER systems, the following are supported hardware profile options: # IBM_Flex_System_p: IBM System p260, p460 #bmcfsp_hwprofile = IBM_System_x_M4 # # # # # #
nameservers The nameservers option lists the IP addresses of your external name servers using a comma-separated list.If this option is not define, by default, the value is set to none.
#nameservers = 192.168.1.40,192.168.1.50 # # ntp_server # # The ntp_server option sets the NTP server.If this option is not defined, # by default, this value is set to pool.htp.org. #ntp_server = pool.ntp.org # # # # # #
enable_export_home The enable_export_home option specifies if the /home mount point exports to the management node. The export home directory is used on all all compute nodes. If this option is not defined, by default, this value is set to yes.
#enable_export_home = yes # # db_admin_password # # The db_admin_password option sets the Platform HPC database administrator password. # If this option is not defined, by default, this value is set to pcmdbadm. #db_admin_password = pcmdbadm # # compute_root_password # # The compute_root_password option sets the root account password for all compute nodes. # If this option is not defined, by default, this value is set to Cluster. #compute_root_password = Cluster # # # # # # #
cluster_name The cluster_name option sets the cluster name for the Platform HPC workload manager. The cluster name must be a string containing any of the following characters: a-z, A-Z, 0-9 or underscore (_). The string length cannot exceed 39 characters. If this option is not defined, by default, this value is set to phpc_cluster.
#cluster_name = phpc_cluster # # cluster_admin # # The cluster_admin specifies the Platform HPC workload manager administrator. This # can be a single user account name, or a comma-separated list of several user account # list. The first user account name in the list is the primary LSF administrator and # it cannot be the root user account. For example: cluster_admin=user_name1,user_name2... # If this option is not defined, by default, this value is set to phpcadmin. #cluster_admin = phpcadmin Chapter 5. Performing a silent installation
33
34
Installing IBM Platform HPC Version 4.2
Chapter 6. Verifying the installation Ensure that you have successfully installed PHPC. Note: You can find the installation log file phpc-installer.log in the /opt/pcm/log directory. This log file includes details and results about your PHPC installation.
To verify that your installation is working correctly, log in to the management node as a root user and complete the following tasks: 1. Source PHPC environment variables. # . /opt/pcm/bin/pcmenv.sh 2. Check that the PostgreSQL database server is running. # service postgresql status (pid 13269) is running... 3. Check that the Platform HPC services are running. # service phpc status Show status of the LSF subsystem lim (pid 31774) is running... res (pid 27663) is running... sbatchd (pid 27667) is running... SERVICE WEBGUI SERVICE jobdt plc plc_group2 purger vdatam 4. Log in to the
STATUS STARTED STATUS STARTED STARTED STARTED STARTED STARTED Web Portal.
WSM_PID 16550
PORT 8080
WSM_PID 5836 5877 5917 5962 6018
HOST_NAME hjc-ip200 hjc-ip200 hjc-ip200 hjc-ip200 hjc-ip200
HOST_NAME hjc-ip200
a. Open a supported web browser. Refer to the Release Notes for a list of
supported web browsers. b. Go to http:// mgtnode-IP:8080, where mgtnode-IP is the real management node IP address. If you are connected to a public network, you can also navigate to http:// mgtnode-hostname:8080, where mgtnode-hostname is the real management node hostname. c. Log in as an administrator or a user. An administrator has administrative privileges that include managing cluster resources. A user account is not able to manage cluster resources but can manage jobs. By default, PHPC creates a default administrative account where the username and password is phpcadmin and phpcadmin . This default phpcadmin administrator account has all administrative privileges. d. After you log in, the Resource Dashboard is displayed in the Web Portal.
© Copyright IBM Corp. 1994, 2014
35
36
Installing IBM Platform HPC Version 4.2
Chapter 7. Taking the first steps after installation After your installation is complete, as an administrator you can get started with managing your clusters. The following tasks can be completed to get started with Platform HPC: Enabling LDAP support for user authentication Provision your nodes by adding the nodes to your cluster Modify your provisioning template settings – Manage image profiles – Manage network profiles Set up the HTTPS connection Submit jobs Create resource reports Create application templates v
v
v
v
v
v
v
For more information about IBM Platform HPC, see the Administering IBM Platform HPC guide. For the latest release information about Platform HPC 4.2, see Platform HPC on IBM Knowledge Center at http://www.ibm.com/support/knowledgecenter/ SSDV85_4.2.0.
© Copyright IBM Corp. 1994, 2014
37
38
Installing IBM Platform HPC Version 4.2
Chapter 8. Troubleshooting installation problems Troubleshooting problems that occurred during the IBM Platform HPC installation. To help troubleshoot your installation, you can view the phpc-installer.log file that is found in the /opt/pcm/log directory. This file logs the installation steps, and any warnings and errors that occurred during the installation. Note: During the installation, the installation progress is logged in a temporary directory that is found here: /tmp/phpc-installer .
To view detailed error messages, run the installer in DEBUG mode when troubleshooting the installation. To run the installer in debug mode, set the PCM_INSTALLER_DEBUG environment variable. When running in DEBUG mode, the installer does not clean up all the files when an error occurs. The DEBUG mode also generates extra log messages that can be used to trace the installer's execution. Set the PCM_INSTALLER_DEBUG environment variable to run the installer in DEBUG mode: # PCM_INSTALLER_DEBUG=1 hpc-ISO-mount/phpc-installer where hpc-ISO-mount is the mount point. Note: Only use the PCM_INSTALLER_DEBUG environment variable, to troubleshoot a PHPC installation using the interactive installer. Do not use it for installing PHPC using silent install.
Common installation issues include the following issues: The Platform HPC installer fails with the error message “Cannot reinstall Platform HPC. Platform HPC is already installed.” To install a new Platform HPC product, you must first uninstall the installed product. During management node pre-checking, one of the checks fails. Ensure that all Platform HPC requirements are met and rerun the installer. For more information about Platform HPC see the Release Notes ®. Setting up shared NFS export fails during installation. To resolve this issue, complete the following steps: 1. Check the rpcbind status. # service rpcbind status 2. If rpcbind is stopped, you must restart it and run the S03_base_nfs.rc.py script. # service rpcbind start v
v
v
v
# cd /opt/pcm/rc.pcm.d/ # pcmconfig -i ./S03_base_nfs.rc.py Cannot log in to the Web Portal, or view the Resource Dashboard in the Web Portal.
– Configure your web browser. Your web browser must be configured to accept first-party and third-party cookies. In some cases, your browser default settings can block these cookies. In this case, you need to manually change this setting. – Restart the Web Portal. In most cases, the services that are required to run the Web Portal start automatically. However, if the Web Portal goes down, you © Copyright IBM Corp. 1994, 2014
39
can restart services and daemons manually. From the command line, issue the following command:# pmcadmin stop ; pmcadmin start
Configuring your browser To properly configure your browser, you must have the necessary plug-ins installed.
About this task If you are using Firefox as your browser, you are required to have the Flash and JRE plug-ins installed. To install the Flash and JRE plug-ins, complete the following steps:
Procedure 1. Install the appropriate Adobe Flash Player plug-in from the Adobe website
(http://get.adobe.com/flashplayer). 2. Check that the Flash plug-in is installed. Enter about:plugins into the Firefox address field. Shockwave Flash appears in the list. 3. Check that the Flash plug-in is enabled. Enter about:config into the Firefox address field. Find dom.ipc.plugins.enabled in the list and ensure that it has a value of true. If it is set to false, double-click it to enable. 4. Restart Firefox. 5. Download the appropriate JRE plug-in installer from the Oracle website (http://www.oracle.com/technetwork/java/javase/downloads/index.html). The 64-bit rpm installer ( jre-7u2-linux-x64.rpm) is recommended. 6. Exit Firefox. To run Java™ applets within the browser, you must install the JRE plug-in manually. For more information about installing the JRE plug-in manually, go to http://docs.oracle.com/javase/7/docs/webnotes/install/linux/linux-plugininstall.html. 7. In the package folder, run the command:
rpm -ivh jre-7u2-linux-x64.rpm 8. When the installation is finished, enter the following commands: cd /usr/lib64/mozilla/plugins ln -s /usr/java/jre1.7.0_02/lib/amd64/libnpjp2.so 9. Check that the JRE plug-in was installed correctly. Start Firefox and enter
about:plugins into the Firefox address field. Java(TM) Plug-in 1.7.0_02 is displayed in the list.
40
Installing IBM Platform HPC Version 4.2
Chapter 9. Setting up a high availability environment Setup an IBM Platform HPC high availability environment. To setup a high availability (HA) environment in Platform HPC, complete the following steps. Table 8. High availability environment roadmap
Actions
Description
Ensure that the high availability requirements are met
Requirements for setting up a shared storage device and a secondary management node must be met.
Preparing high availability
Set up the secondary management node with an operating system and Platform HPC installation.
Enable a Platform HPC high availability environment
Set up Platform HPC high availability on the primary and secondary management nodes.
Complete the high availability enablement
After high availability is enabled setup up the compute nodes.
Verify Platform HPC high availability
Ensure that Platform HPC high availability is running correctly on the primary and secondary management nodes.
Troubleshooting enablement problems
Troubleshooting problems that occurred during a Platform HPC high availability environment setup.
Preparing high availability Preparing an IBM Platform HPC high availability environment.
Before you begin Ensure that all high availability requirements are met and a shared file system is created on a shared storage server.
About this task To prepare a high availability environment, set the secondary management node with the same operating system and PHPC version as on the primary management node. After the secondary management node is set up, the necessary SSH connections and configuration must be made between the primary management node and the secondary management node.
Procedure 1. Install the operating system on the secondary node. The secondary
management node must use the same operating system and version as used on the primary management node. Both management nodes must use the same network and must be connected to the same network interface. Refer to “Installing and configuring the operating system on the management node” on page 13. © Copyright IBM Corp. 1994, 2014
41
2. Ensure that the time and time zone is the same on the primary and secondary
management nodes. a. To verify the current time zone, run the cat /etc/sysconfig/clock command. To determine the correct time zone, refer to the information found in the /usr/share/zoneinfo directory. b. If the time zone is incorrect, update the time zone. To update the time zone, set the correct time zone in the /etc/sysconfig/clock file. For example: For RHEL: ZONE=”US/Eastern” For SLES: TIMEZONE=”America/New_York” c. Set the local time in the /etc/localtime file, for example: ln –s /usr/share/zoneinfo/US/Eastern /etc/localtime d. Set the date on both management nodes. Issue the following command on both management nodes. date -s current_time e. If the management nodes already have PHPC installed, run the following command on both management node to get the system time zone. lsdef -t site -o clustersite -i timezone If the system time zones are different, update the system time zone on the secondary node, run the following command: chdef -t site -o clustersite timezone=US/Eastern 3. Install PHPC on the secondary node. You must use the same PHPC ISO file as you used for the management node. You can complete the installation using the installer or the silent installation. The installer includes an interactive display where you can specify your installation options, make sure to use the same installation options as the primary management node. Installation options for the primary management node are found in the installation log file ( /opt/pcm/log/phpc-installer.log) on the primary management node. Refer to Chapter 4, “Performing an installation,” on page 17. If you use the silent installation to install PHPC, you can use the same response file for both management nodes. Refer to Chapter 5, “Performing a silent installation,” on page 29 4. Verify that the management nodes can access the shared file systems, issue the showmount -e nfs-server-ip command, where nfs-server-ip is the IP address of the NFS server that connects to the provision network. 5. Add the secondary management node entry to the /etc/hosts file on the primary management node. Ensure that the failover node name can be resolved to the secondary management node provision IP address. Run the command below on the primary management node. echo "secondary-node-provision-ip secondary-node-name" >> /etc/hosts #ping secondary-node-name where secondary-node-provision-ip is the provision IP address of the secondary node and secondary-node-name is the name of the secondary node. For example: #echo "192.168.1.4 backupmn" >> /etc/hosts 6. Backup and configure a passwordless SSH connection between the primary management node and the secondary node.
42
Installing IBM Platform HPC Version 4.2
# Back up the SSH key on the secondary node. ssh secondary-node-name cp –rf /root/.ssh /root/.ssh.PCMHA # Configure passwordless SSH between the management node and the secondary node. cat /root/.ssh/id_rsa.pub > /root/.ssh/authorized_keys scp –r /root/.ssh/* secondary-node-name:/root/.ssh
where secondary-node-provision-ip is the provision IP address of the secondary node and secondary-node-name is the name of the secondary node. 7. Prepare the compute nodes. These steps are used for provisioned compute nodes that you do not want to reprovision. a. Shutdown the LSF services on the compute nodes. # xdsh __Managed ’service lsf stop’ b. Unmount and remove the /home and /share mount points on the compute nodes. # updatenode __Managed ’mountnfs del’ # xdsh __Managed 'umount /home' # xdsh __Managed 'umount /shared'
Enable a high availability environment Enable an IBM Platform HPC high availability environment.
Before you begin Ensure that the secondary management node is installed and setup correctly. Ensure that SSH connections are configured and network settings are correct between the primary management node and the secondary management node.
About this task You can set up the high availability environment using the high availability management tool (pcmhatool). The tool defines and sets up a high availability environment between the management nodes using a predefined high availability definition file. Note: The high availability management tool ( pcmhatool) supports Bash shell only.
Procedure 1. Define a high availability definition file according to your high availability
settings, including: virtual name, virtual IP address, and shared storage. The high availability definition file example ha.info.example is in the /opt/pcm/share/examples/HA directory. Refer to “High availability definition file” on page 67. 2. Set up a high availability environment. Setup can take several minutes to synchronize data to shared storage. Ensure that the shared storage server is always available. Issue the following command on the primary management node. pcmhatool config -i ha-definition-file -s secondary-management-node where ha-definition-file is the high availability definition file that you created in step 1, and secondary-management-node is the name of the secondary management node.
Chapter 9. Setting up high availability
43
Usage notes 1. During a high availability enablement, some of the services start on the standby
management node instead of the active management node. After a few minutes, they switch to the active management node. 2. If the management node crashes during the high availability environment setup, rerun the pcmhatool command and specify the same options. Running this command again cleans up the incomplete environment and starts the high availability enablement again. 3. You can find the enablement log file ( pcmhatool.log) in the /opt/pcm/log directory. This log file includes details and results about the high availability environment setup. 4. If you enable high availability, the pcmadmin command cannot be used to restart the PERF loader. In a high availability, use the following commands to restart the PERF loader: pcm-ha-support start --service PLC pcm-ha-support start --service PLC2 pcm-ha-support start --service JOBDT pcm-ha-support start --service PTC pcm-ha-support start --service PURGER
What to do next After the high availability enablement is complete, verify that the Platform HPC high availability environment is set up correctly.
Completing the high availability enablement After high availability is enabled, you can set up and configure additional options, such as configuring an IPMI device as a fencing device to protect your high availability cluster from malfunctioning nodes and services. You can also set up email notification when a failover is triggered.
Configure IPMI as a fencing device In a high availability cluster that has only two management nodes, it is important to configure fencing on an IPMI device. Fencing is the process of isolating a node or protecting shared resources from a malfunctioning node within a high availability environment. The fencing process locates the malfunctioning node and disables it. Use remote hardware control to configure fencing on an IPMI device.
Before you begin This fencing method requires both management nodes to be controlled remotely using IPMI. If your management nodes are on a power system or using a different remote power control method, you must create the corresponding fencing script accordingly.
Procedure 1. Create an executable fencing script on the shared file system. For example, you
can use the example fencing script ( fencing_ipmi.sh) that is found in the /opt/pcm/share/examples/HA directory. Run the following commands to create the script on a shared file system. Ensure that you modify fencing_ipmi.sh to your real environment settings.
44
Installing IBM Platform HPC Version 4.2
mkdir -p /install/failover cp /opt/pcm/share/examples/HA/fencing_ipmi.sh /install/failover 2. Edit the HA controller service agent configuration file ( ha_wsm.cfg) in the /opt/pcm/etc/failover directory on the active management node. In the [__Failover__] section, set the value for fencing_action parameter to the absolute path of your custom script. For example: fencing_action =/install/failover/fencing_ipmi.sh 3. Restart the PCMHA service agent. pcm-ha-support start --service PCMHA
Create a failover notification Create a notification, such as an email notification, for a triggered failover.
Before you begin Note: Before you can send email for a triggered failover, you must configure your mail parameters. Refer to “Setting up SMTP mail settings.”
Procedure 1. Create an executable script on the shared file system. For example, you can use
an executable script that sends an email when a failover is triggered. An example send email script ( send_mail.sh) is in the /opt/pcm/share/examples/HA directory. Run the following commands to create the script on a shared file system. Ensure that you modify send_mail.sh to your real environment settings. mkdir -p /install/failover cp /opt/pcm/share/examples/HA/send_mail.sh /install/failover 2. Edit the high availability controller configuration file ( ha_wsm.cfg) on the management node in the /opt/pcm/etc/failover directory. In the[__Failover__] section, set the failover_action parameter to the absolute path of your custom script. For example: failover_action=/install/failover/send_mail.sh 3. Restart the high availability environment. pcm-ha-support start --service PCMHA
Setting up SMTP mail settings Specify SMTP mail settings in IBM Platform HPC.
Before you begin To send email from Platform HPC, an SMTP server must already be installed and configured.
Procedure 1. Log in to the Web Portal as the system administrator. 2. In the System & Settings tab, click General Settings . 3. Expand the Mail Settings heading. a. Enter the mail server (SMTP) host. b. Enter the mail server (SMTP) port. c. Enter the user account. This field is only required by some servers. d. Enter the user account password. This field is only required by some servers. Chapter 9. Setting up high availability
45
4. Click Apply.
Results SMTP server settings are configured. Platform HPC uses the configured SMTP server to send email. The account from which the mail is sent is the user email account. However, if the user email account is not specified then the email account uses the management node name as the email address.
Verifying a high availability environment Verify an IBM Platform HPC high availability environment.
Before you begin You can find the enablement log file ( pcmhatool.log) in the /opt/pcm/log directory. This log file includes details and results about your PHPC enablement.
Procedure 1. Log on to the management node as a root user. 2. Source Platform HPC environment variables. # . /opt/pcm/bin/pcmenv.sh 3. Check that Platform HPC high availability is configured. # pcmhatool info Configuring status: OK ================================================================ HA group members: master, failover Virtual node name: virtualmn Virtual IP for : 192.168.0.100 Virtual IP for : 172.20.7.100 Shared work directory on: 172.20.7.200:/export/data Shared home directory on: 172.20.7.200:/export/home 4. Check that Platform HPC services are running. All services must be in state
STARTED, for example: # service phpc status Show status of the LSF subsystem lim (pid 29003) is running... res (pid 29006) is running... sbatchd (pid 29008) is running... SERVICE STATE ALLOC CONSUMER RGROUP RESOURCE PLC STARTED 32 /Manage* Manag* master PTC STARTED 34 /Manage* Manag* master PURGER STARTED 35 /Manage* Manag* master WEBGUI STARTED 31 /Manage* Manag* master JOBDT STARTED 36 /Manage* Manag* master PLC2 STARTED 33 /Manage* Manag* master PCMHA STARTED 28 /Manage* Manag* master PCMDB STARTED 29 /Manage* Manag* master XCAT STARTED 30 /Manage* Manag* master 5. Log in to the Web Portal.
SLOTS 1 1 1 1 1 1 1 1 1
SEQ_NO 1 1 1 1 1 1 1 1 1
INST_STATE RUN RUN RUN RUN RUN RUN RUN RUN RUN
ACTI 9 8 7 4 6 5 1 2 3
a. Open a supported web browser. Refer to the Release Notes for a list of
supported web browsers. b. Go to http:// mgtnode-virtual-IP:8080, where mgtnode-virtual-IP is the management node virtual IP address. If you are connected to a public network, you can also navigate to http:// mgtnode-virtual-hostname:8080, where mgtnode-virtual-hostname is the virtual management node hostname.
46
Installing IBM Platform HPC Version 4.2
If HTTPS is enabled, go to https:// mgtnode-virtual-IP:8443 or https:// mgtnode-virtual-hostname:8443 to log in to the web portal. c. Log in as an administrator or user. An administrator has administrative privileges that include managing cluster resources. A user account is not able to manage cluster resources but can manage jobs. d. After you log in, the Resource Dashboard is displayed in the Web Portal. Under the Cluster Health option, both management nodes are listed.
Troubleshooting a high availability environment enablement Troubleshooting an IBM Platform HPC high availability environment. To help troubleshoot your high availability enablement, you can view the log file is found here /opt/pcm/log/pcmhatool.log . This file logs the high availability enablement steps, and any warnings and errors that occurred during the high availability enablement. Common high availability enablement issues include the following issues: When you run a command on the management node, the command stops responding. To resolve this issue, log in the management node with a new session. Ensure that the external NFS server is available and check the network connection to the NFS server is available. If you cannot log in the management node, try to reboot it. When you check the Platform HPC service status, one of the service agent statuses is set to ERROR. When the monitored service daemon is down, the service agent attempts to restart it several times. If it continually fails, the service agent is set to ERROR. To resolve this issue, check the service daemon log for more detail on how to resolve this problem. If the service daemon can be started manually, restart the service agent again, issue the following command: pcm-ha-support start --service service_name v
v
v
v
v
v
where service_name is the name of the service that is experiencing the problem. Services are running on the standby management node after an automatic failover occurs due to a provision network failure. Platform HPC high availability environment uses the provision network for heartbeat communication. The provision network failure causes the management nodes to lose the communication, and fencing to stop working. To resolve this issue, stop the service agents manually, issue the following command: pcm-ha-support stop --service all Parsing high availability settings fails. To resolve this issue, ensure that the high availability definition file does not have any formatting errors, the correct virtual name, and that the IP address does not conflict with an existing node. conflicts with existing managed node. Also, ensure that the xCAT daemon is running by issuing the command tabdump site. During the pre-checking, one of the checks fails. To resolve this issue, ensure that all Platform HPC high availability requirements are met and rerun the high availability enablement tool. Syncing data to shared directory fails.
Chapter 9. Setting up high availability
47
To resolve this issue, ensure that the network connection to the external shared storage is stable during the high availability enablement. If a timeout occurs during data synchronization, rerun the tool by setting PCMHA_NO_CLEAN environment variable. This environment variable ensures that existing data on the NFS server is unchanged. #PCMHA_NO_CLEAN=1 pcmhatool config –i ha-definition-file –s secondary-management-node
v
48
where ha-definition-file is the high availability definition file and secondary-management-node is the name of the secondary management node. Cannot log in to the Web Portal, or view the Resource Dashboard in the Web Portal. All Platform HPC services are started a few minutes after the high availability enablement. Wait a few minutes and try again. If the issue persists, run the high availability diagnostic tool to check the running status. #pcmhatool check
Installing IBM Platform HPC Version 4.2
Chapter 10. Upgrading IBM Platform HPC Upgrade IBM Platform HPC from Version 4.1.1.1 to Version 4.2. Additionally, you can upgrade the product entitlement files for Platform Application Center or LSF.
Upgrading to Platform HPC Version 4.2 Upgrade from Platform HPC Version 4.1.1.1 to Version 4.2. The upgrade procedure ensures that the necessary files are backed up and necessary files are restored. The following upgrade paths are available: Upgrading from Platform HPC 4.1.1.1 to 4.2 without OS reinstall Upgrading from Platform HPC 4.1.1.1 to 4.2 with OS reinstall v
v
If any errors occur during the upgrade process, you can roll back to an earlier version of Platform HPC. For a list of all supported upgrade procedures, refer to the Release notes for Platform HPC 4.2 guide.
Upgrade planning Upgrading IBM Platform HPC involves several steps that you must complete in the appropriate sequence. Review the upgrade checklist and upgrade roadmap before you begin the upgrade process.
Upgrading checklist Use the following checklist to review the necessary requirements before upgrading. In order to upgrade to the newest release of IBM Platform HPC, ensure you meet the following criteria before proceeding with the upgrade. Table 9.
Requirements
Description
Hardware requirements
Ensure that you meet the hardware requirements for Platform HPC. Refer to “PHPC requirements” on page 9.
Software requirements
Ensure that you meet the software requirements for Platform HPC. Refer to “PHPC requirements” on page 9.
External storage device
Obtain an external storage to store the necessary backup files. Make sure that the external storage is larger than the size of your backup files.
Obtain a copy of the Platform HPC 4.2 ISO
Get a copy of Platform HPC 4.2
(Optional) Obtain a copy of the latest supported version operating system
Optionally, you can upgrade your operating system to the latest supported version.
© Copyright IBM Corp. 1994, 2014
49
Upgrading roadmap Overview of the upgrade procedure. Table 10. Upgrading Platform HPC
Actions
Description
1.
Upgrading checklist
Ensure that you meet all of the requirements before upgrading Platform HPC.
2.
Preparing to upgrade
Before you can upgrade to the newest release of Platform HPC you must complete specific tasks.
3.
Creating a Platform HPC 4.1.1.1 backup
Create a backup of your current Platform HPC 4.1.1.1 settings and database. This backup is used to restore your existing settings to the newer version of Platform HPC.
4.
Perform the Platform HPC upgrade
Perform the upgrade using your chosen path: v
v
Upgrading to Platform HPC 4.2 without OS reinstall Upgrading to Platform HPC 4.2 with OS reinstall
5.
Completing the upgrade
Ensure that data is restored and services are restarted.
6.
Verifying the upgrade
Ensure that PHPC is successfully upgraded.
7.
(Optional) Applying fixes
After you upgrade PHPC, you can check if there are any fixes available though the IBM Fix Central.
Upgrading to Platform HPC 4.2 without OS reinstall Upgrade your existing installation of IBM Platform HPC to the most recent version without reinstalling the operating system on the management node. Note that if you are upgrading Platform HPC to Version 4.2 without reinstalling the operating system, the PMPI kit version is not upgraded.
Preparing to upgrade Before upgrading your IBM Platform HPC installation, there are some steps you should follow to ensure your upgrade is successful.
Before you begin To prepare for your upgrade, ensure that you have the following items: You must have an external backup to store the contents of your 4.1.1.1 backup. The Platform HPC 4.2 ISO file. If you are upgrading the operating system, make sure that you have the RHEL ISO file, and that you have a corresponding OS distribution created. v
v
v
For additional requirements refer to “Upgrading checklist” on page 49.
About this task Before you upgrade to the next release of Platform HPC, you must complete the following steps:
50
Installing IBM Platform HPC Version 4.2
Procedure 1. Mount the Platform HPC installation media: mount -o loop phpc-4.2.x64.iso /mnt 2. Upgrade the pcm-upgrade-tool package.
For RHEL: rpm -Uvh /mnt/packages/repos/kit-phpc-4.2-rhels-6-x86_64/pcm-upgrade-tool-*.rpm For SLES: rpm -Uvh /mnt/packages/repos/kit-phpc-4.2-sles-11-x86_64/pcm-upgrade-tool-*.rpm 3. Set up the upgrade environment. export PATH=${PATH}:/opt/pcm/libexec/ 4. Prepare an external storage. a. Ensure that the external storage has enough space for the backup files. To check how much space you require for the back, run the following commands: # du -sh /var/lib/pgsql/data # du -sh /install/ Note: It is recommended that the size of your external storage is greater than the combined size of the database and the /install directory. b. On the external storage, create a directory for the database backup. mkdir /external-storage-mnt/db-backup
where the external-storage-mnt is the backup location on your external storage. c. Create a directory for the configuration file backup. mkdir /external-storage-mnt/config-backup where the external-storage-mnt is the backup location on your external storage. 5. Determine which custom metrics you are using, if any. The custom metrics are lost in the upgrade process, and can manually be re-created after the upgrade is completed. 6. If you created any new users after Platform HPC was installed, you must include these new users in your backup. /opt/xcat/bin/updatenode mn-host-name -F where mn-host-name is the name of your management node.
Backing up Platform HPC Create a backup of your current Platform HPC installation that includes a backup of the database and settings before you upgrade to a newer version of Platform HPC. Note: The backup procedure does not back up any custom configurations. After the upgrade procedure is completed, the following custom configurations can be manually re-created: v
v
v
v
Customization to the PERF loader, including internal data collection and the purger configuration files Customization to the Web Portal Help menu navigation Addition of custom metrics Alert polices Chapter 10. Upgrading
51
v
LDAP packages and configurations
Before you begin Platform HPC does not back up or restore LSF configuration files or data. Before you upgrade, make sure to back up your LSF configuration files and data. After the upgrade is complete, you can apply your backed up configuration files and data.
Procedure 1. Stop Platform HPC services: pcm-upgrade-tool.py services --stop 2. Create a database backup on the external storage. The database backup backs up the database data and schema. pcm-upgrade-tool.py backup --database -d /external-storage-mnt/db-backup/
where external-storage-mnt is the backup location on your external storage. The backup includes database files and the backup configuration file pcm.conf. 3. Create a configuration file backup on the external storage. pcm-upgrade-tool.py backup --files -d /external-storage-mnt/config-backup/
Performing the Platform HPC upgrade Perform the upgrade without reinstalling the operating system and restore your settings.
Before you begin Ensure that a backup of your previous settings was created before you proceed with the upgrade.
Procedure 1. Upgrade Platform HPC from 4.1.1.1 to 4.2, complete the following steps: a. Upgrade the database schema. pcm-upgrade-tool.py upgrade --schema b. If you created custom metrics in Platform HPC 4.1.1.1, you can manually re-create them. See more about Defining metrics in Platform HPC. c. Start the HTTP daemon (HTTPd).
For RHEL: # service httpd start For SLES: # service apache2 start d. Start the xCAT daemon. # service xcatd start e. Upgrade Platform HPC. pcm-upgrade-tool.py upgrade --packages -p /root/phpc-4.2.x64.iso f. Copy the Platform HPC entitlement file to the /opt/pcm/entitlement directory. 2. Restore settings and database data, complete the following steps: a. Stop the xCAT daemon. /etc/init.d/xcatd stop b. Restore database data from a previous backup.
52
Installing IBM Platform HPC Version 4.2
pcm-upgrade-tool.py restore --database -d /external-storage-mnt/db-backup/
where external-storage-mnt is the backup location on your external storage and db-backup is the location of the database backup. c. Restore configuration files from a previous backup. pcm-upgrade-tool.py restore --files -f /external-storage-mnt/config-backup/ 20130708-134535.tar.gz where config-backup is the location of the configuration file backup. 3. Upgrade the LSF component from Version 9.1.1 to LSF 9.1.3. a. Create an LSF installer configuration file ( lsf.install.config) and add it to the /install/kits/kit-phpc-4.2/other_files directory. Refer to the lsf.install.config in the /install/kits/kit-phpc-4.1.1.1/other_files directory and modify the parameters as needed. b. Replace LSF postscripts to in directory /install/postscripts/ . cp /install/kits/kit-phpc-4.2/other_files/KIT_phpc_lsf_setup /install/postscripts/ cp /install/kits/kit-phpc-4.2/other_files/KIT_phpc_lsf_config /install/postscripts/ cp /install/kits/kit-phpc-4.2//other_files/lsf.install.config /install/postscripts/phpc
c. Extract LSF installer package to a temp directory. The LSF installer package
is placed at /install/kits/kit-phpc-4.2/other_files/ For example: tar xvzf /install/kits/kit-phpc-4.2/other_files/lsf9.1.3_lsfinstall_linux_x86_64.tar.Z -C /tmp/lsf
d. Run the LSF installation. 1) Navigate to the LSF installer directory. cd /tmp/lsf 2) Copy the lsf.install.config configuration file from
/install/kits/kit-phpc-4.2/other_files. cp /install/kits/kit-phpc-4.2/other_files/lsf.install.config ./ 3) Run the LSF installer. ./lsfinstall -f lsf.install.config
Completing the upgrade To complete the upgrade to the next release of IBM Platform HPC, you must restore your system settings, database settings, and update the compute nodes.
Procedure 1. Restart Platform HPC services. pcm-upgrade-tool.py services --reconfig 2. Refresh the database and configurations: pcm-upgrade-tool.py upgrade --postupdate 3. If you previously installed GMF and the related monitoring packages with
Platform HPC, you must manually reinstall these packages. To check which monitoring packages are installed, run the following commands: rpm -qa | grep chassis-monitoring rpm -qa | grep switch-monitoring rpm -qa | grep gpfs-monitoring rpm -qa | grep gmf a. Uninstall the GMF package and the monitoring packages. rpm -e --nodeps pcm-chassis-monitoring-1.2.1-1.x86_64 rpm -e --nodeps pcm-switch-monitoring-1.2.1-1.x86_64 rpm -e --nodeps pcm-gpfs-monitoring-1.2.1-1.x86_64 rpm -e --nodeps pcm-gmf-1.2-1.x86_64 b. Install the GMF package that is found in the /install/kits/kit-pcm-4.2/ repos/kit-phpc-4.2-rhels-6-x86_64 directory. Chapter 10. Upgrading
53
rpm -ivh pcm-gmf-1.2-1.x86_64.rpm c. Install the switch monitoring package that is found in the
/install/kits/kit-pcm-4.2/repos/kit-phpc-4.2-rhels-6-x86_64 directory. rpm -ivh pcm-switch-monitoring-1.2.1-1.x86_64.rpm d. Install the chassis monitoring package that is found in the
/install/kits/kit-pcm-4.2/repos/kit-phpc-4.2-rhels-6-x86_64 directory. rpm -ivh pcm-chassis-monitoring-1.2.1-1.x86_64.rpm e. If you have GPFS installed, run the following command to install the GPFS monitoring package. The GPFS monitoring package is available in the /install/kits/kit-pcm-4.2/repos/kit-phpc-4.2-rhels-6-x86_64 directory. rpm –ivh pcm-gpfs-monitoring-1.2.1-1.x86_64.rpm f. Restart Platform HPC services. # pcmadmin service restart --group ALL 4. Upgrade compute nodes. a. Check if the compute nodes are reachable. Compute node connections can
get lost during the upgrade process, ping the compute nodes to ensure that they are connected to the management node: xdsh noderange "/bin/ls" For any compute nodes that have lost connection and cannot be reached, use the rpower command to reboot the node: rpower noderange reset where noderange is a comma-separated list of nodes or node groups b. Update compute nodes to include the Platform HPC package. updatenode noderange -S where noderange is a comma-separated list of nodes or node groups. c. Restart monitoring services. xdsh noderange "source /shared/ibm/platform_lsf/conf/ego/phpc_cluster/kernel/profile.ego; ego where noderange is a comma-separated list of nodes or node groups. 5. Restart the LSF cluster. Run the following command on the management node. lsfrestart -f 6. An SSL V3 security issue exists within the Tomcat server when HTTPS is enabled. If you have not previously taken steps to fix this issue, you can skip this step. Otherwise, if you have HTTPS enabled, complete the following steps to fix this issue. a. Edit the $GUI_CONFDIR/server.xml file. In the connector XML tag, set the sslProtocol value from SSL to TLS , and save the file. For example: b. Restart the Web Portal service. pcmadmin service stop --service WEBGUI pcmadmin service start --service WEBGUI
54
Installing IBM Platform HPC Version 4.2
Verifying the upgrade Ensure that the upgrade procedure is successful and that Platform HPC is working correctly. Note: A detailed log of the upgrade process can be found in the upgrade.log file in the /opt/pcm/log directory.
Procedure 1. Log in to the management node as a root user. 2. Source Platform HPC environment variables. # . /opt/pcm/bin/pcmenv.sh 3. Check that the PostgreSQL database server is running. # service postgresql status (pid 13269) is running... 4. Check that the Platform HPC services are running. # service xcatd status xCAT service is running # service phpc status
Show status of the LSF subsystem lim (pid 15858) is running... res (pid 15873) is running... sbatchd (pid 15881) is running... SERVICE RULE-EN* PCMD JOBDT PLC PURGER PTC PLC2 WEBGUI ACTIVEMQ
STATE STARTED STARTED STARTED STARTED STARTED STARTED STARTED STARTED STARTED
ALLOC 18 17 12 13 11 14 15 19 16
CONSUMER /Manage* /Manage* /Manage* /Manage* /Manage* /Manage* /Manage* /Manage* /Manage*
RGROUP RESOURCE SLOTS Manag* * 1 Manag* * 1 Manag* * 1 Manag* * 1 Manag* * 1 Manag* * 1 Manag* * 1 Manag* * 1 Manag* * 1
[ SEQ_NO 1 1 1 1 1 1 1 1 1
OK ] INST_STATE RUN RUN RUN RUN RUN RUN RUN RUN RUN
ACTI 17 16 11 12 10 13 14 18 15
5. Check that the correct version of Platform HPC is running. # cat /etc/phpc-release 6. Log in to the Web Portal. a. Open a supported web browser. Refer to the Release Notes for a list of
supported web browsers. b. Go to http:// mgtnode-IP:8080, where mgtnode-IP is the real management node IP address. If you are connected to a public network, you can also navigate to http:// mgtnode-hostname:8080, where mgtnode-hostname is the real management node hostname. c. Log in as a root user. The root user has administrative privileges and maps to the operating system root user. d. After you log in, the Resource Dashboard is displayed in the Web Portal.
Upgrading to Platform HPC 4.2 with OS reinstall Upgrade your existing installation of IBM Platform HPC to the most recent version, and reinstall or upgrade the operating system on the management node.
Preparing to upgrade Before upgrading your IBM Platform HPC installation, there are some steps you should follow to ensure your upgrade is successful.
Chapter 10. Upgrading
55
Before you begin To prepare for your upgrade, ensure that you have the following items: You must have an external backup to store the contents of your 4.1.1.1 backup. The Platform HPC 4.2 ISO file. If you are upgrading the operating system, make sure that you have the RHEL ISO file, and that you have a corresponding OS distribution created. v
v
v
For additional requirements refer to “Upgrading checklist” on page 49.
About this task Before you upgrade to the next release of Platform HPC, you must complete the following steps:
Procedure 1. Mount the Platform HPC installation media: mount -o loop phpc-4.2.x64.iso /mnt
Upgrade the pcm-upgrade-tool package. For RHEL: rpm -Uvh /mnt/packages/repos/kit-phpc-4.2-rhels-6-x86_64/pcm-upgrade-tool-*.rpm For SLES: rpm -Uvh /mnt/packages/repos/kit-phpc-4.2-sles-11-x86_64/pcm-upgrade-tool-*.rpm 3. Set up the upgrade environment. export PATH=${PATH}:/opt/pcm/libexec/ 4. Prepare an external storage. a. Ensure that the external storage has enough space for the backup files. To check how much space you require for the back, run the following commands: # du -sh /var/lib/pgsql/data # du -sh /install/ 2.
Note: It is recommended that the size of your external storage is greater than the combined size of the database and the /install directory. b. On the external storage, create a directory for the database backup. mkdir /external-storage-mnt/db-backup
where the external-storage-mnt is the backup location on your external storage. c. Create a directory for the configuration file backup. mkdir /external-storage-mnt/config-backup where the external-storage-mnt is the backup location on your external storage. 5. Determine which custom metrics you are using, if any. The custom metrics are lost in the upgrade process, and can manually be re-created after the upgrade is completed. 6. If you created any new users after Platform HPC was installed, you must include these new users in your backup. /opt/xcat/bin/updatenode mn-host-name -F where mn-host-name is the name of your management node.
56
Installing IBM Platform HPC Version 4.2
Backing up Platform HPC Create a backup of your current Platform HPC installation that includes a backup of the database and settings before you upgrade to a newer version of Platform HPC. Note: The backup procedure does not back up any custom configurations. After the upgrade procedure is completed, the following custom configurations can be manually re-created: v
v
v
v
v
Customization to the PERF loader, including internal data collection and the purger configuration files Customization to the Web Portal Help menu navigation Addition of custom metrics Alert polices LDAP packages and configurations
Before you begin Platform HPC does not back up or restore LSF configuration files or data. Before you upgrade, make sure to back up your LSF configuration files and data. After the upgrade is complete, you can apply your backed up configuration files and data.
Procedure 1. Stop Platform HPC services: pcm-upgrade-tool.py services --stop 2. Create a database backup on the external storage. The database backup backs up the database data and schema. pcm-upgrade-tool.py backup --database -d /external-storage-mnt/db-backup/
where external-storage-mnt is the backup location on your external storage. The backup includes database files and the backup configuration file pcm.conf. 3. Create a configuration file backup on the external storage. pcm-upgrade-tool.py backup --files -d /external-storage-mnt/config-backup/
Performing the Platform HPC upgrade Perform the upgrade with reinstalling the operating system and restore your settings.
Before you begin Ensure that you have prepared for the upgrade and have an existing backup of your previous settings.
Procedure 1. Reinstall the management node, complete the following steps: a. Record the following management node network settings: hostname, IP address, netmask, and default gateway. b. If you are upgrading to a new machine, you must power off the old management node before you power on the new management node. c. Reinstall the RHEL 6.5 operating system on the management node. Ensure you use the same network settings as the old management node, including: hostname, IP address, netmask, and default gateway. Chapter 10. Upgrading
57
Refer to “Installing and configuring the operating system on the management node” on page 13 on more information on installing an RHEL operating system. 2. Install Platform HPC 4.2. In this step, the RHEL operating system is specified. If you are using a different operating system, specify the operating system accordingly. a. Locate the default silent installation template autoinstall.conf.example in the docs directory in the installation ISO. mount -o loop phpc-4.2.x64.rhel.iso /mnt cp /mnt/docs/phpc-autoinstall.conf.example ./phpc-autoinstall.conf b. Edit the silent installation template and set the os_kit parameter to the absolute path of the operating system ISO. vi ./phpc-autoinstall.conf c. Start the installation by running the silent installation. /mnt/phpc-installer -f ./phpc-autoinstall.conf 3. Set up your environment. export PATH=${PATH}:/opt/pcm/libexec/ 4. Restore settings and database data, complete the following steps: a. Stop Platform HPC services. pcm-upgrade-tool.py services -–stop b. If you created custom metrics in Platform HPC 4.1.1.1, you can manually re-create them. Refer to the "Defining metrics in Platform HPC" section in the Administering Platform HPC guide for more information. c. Restore database data from a previous backup. pcm-upgrade-tool.py restore --database -d /external-storage-mnt/db-backup/ where external-storage-mnt is the backup location on your external storage and db-backup is the location of the database backup. d. Restore configuration files from a previous backup. pcm-upgrade-tool.py restore --files -f /external-storage-mnt/config-backup/ 20130708-134535.tar.gz where config-backup is the location of the configuration file backup. Related information : “Installing and configuring the operating system on the management node” on page 13
Completing the upgrade To complete the upgrade to the next release of IBM Platform HPC and complete the operating system reinstallation, you must restore your system settings, database settings, and update the compute nodes.
Procedure 1. Restart Platform HPC services. pcm-upgrade-tool.py services --reconfig 2. By default, the OS distribution files are not backed up or restored. The OS distribution files can be manually created after the management node upgrade is complete and before upgrading the compute nodes. To recreate an OS distribution, run the following commands: a. Mount the operating system. # mount -o loop rhel-6.4-x86_64.iso /mnt
58
Installing IBM Platform HPC Version 4.2
where rhel-6.4-x86_64.iso is the name of the OS distribution. b. Create a new backup directory. The backup directory must be the same as the OS distribution path. To determine the OS distribution path, use the lsdef -t osdistro rhels6.4-x86_64 command to get the OS distribution path. # mkdir /install/rhels6.4/x86_64 c. Synchronize the new directory. # rsync -a /mnt/* /install/rhels6.4/x86_64 3. Refresh the database and configurations: pcm-upgrade-tool.py upgrade --postupdate 4. Update compute nodes. If you want to upgrade the compute nodes to a higher OS version, you must reprovision them. Otherwise, complete this step. a. Check if the compute nodes are reachable. Compute node connections can get lost during the upgrade process, ping the compute nodes to ensure that they are connected to the management node: xdsh noderange "/bin/ls" For any compute nodes that have lost connection and cannot be reached, use the rpower command to reboot the node: rpower noderange reset where noderange is a comma-separated list of nodes or node groups b. Recover the SSH connection to the compute nodes. xdsh noderange -K where noderange is a comma-separated list of nodes or node groups. c. Update compute nodes to include the Platform HPC 4.2 package. updatenode noderange -S where noderange is a comma-separated list of nodes or node groups. d. Restart monitoring services. xdsh noderange "source /opt/pcm/ego/profile.platform; egosh ego shutdown -f; egosh ego start -f" where noderange is a comma-separated list of nodes or node groups. 5. By default, the LDAP configurations are not backed up or restored. If you want to enable LDAP, refer to "LDAP user authentication" section in the Administering Platform HPC guide. 6. An SSL V3 security issue exists within the Tomcat server when HTTPS is enabled. If you have not previously taken steps to fix this issue, you can skip this step. Otherwise, if you have HTTPS enabled, complete the following steps to fix this issue. a. Edit the $GUI_CONFDIR/server.xml file. In the connector XML tag, set the sslProtocol value from SSL to TLS , and save the file. For example: Chapter 10. Upgrading
59
b. Restart the Web Portal service. pcmadmin service stop --service WEBGUI pcmadmin service start --service WEBGUI
Verifying the upgrade Ensure that the upgrade procedure is successful and that Platform HPC is working correctly. Note: A detailed log of the upgrade process can be found in the upgrade.log file in the /opt/pcm/log directory.
Procedure 1. Log in to the management node as a root user. 2. Source Platform HPC environment variables. # . /opt/pcm/bin/pcmenv.sh 3. Check that the PostgreSQL database server is running. # service postgresql status (pid 13269) is running... 4. Check that the Platform HPC services are running. # service xcatd status xCAT service is running # service phpc status Show status of the LSF subsystem lim (pid 15858) is running... res (pid 15873) is running... sbatchd (pid 15881) is running...
[ OK ] SERVICE STATE ALLOC CONSUMER RGROUP RESOURCE SLOTS SEQ_NO INST_STATE ACTI RULE-EN* STARTED 18 /Manage* Manag* * 1 1 RUN 17 PCMD STARTED 17 /Manage* Manag* * 1 1 RUN 16 JOBDT STARTED 12 /Manage* Manag* * 1 1 RUN 11 PLC STARTED 13 /Manage* Manag* * 1 1 RUN 12 PURGER STARTED 11 /Manage* Manag* * 1 1 RUN 10 PTC STARTED 14 /Manage* Manag* * 1 1 RUN 13 PLC2 STARTED 15 /Manage* Manag* * 1 1 RUN 14 WEBGUI STARTED 19 /Manage* Manag* * 1 1 RUN 18 ACTIVEMQ STARTED 16 /Manage* Manag* * 1 1 RUN 15 5. Check that the correct version of Platform HPC is running. # cat /etc/phpc-release 6. Log in to the Web Portal. a. Open a supported web browser. Refer to the Release Notes for a list of supported web browsers. b. Go to http:// mgtnode-IP:8080, where mgtnode-IP is the real management
node IP address. If you are connected to a public network, you can also navigate to http:// mgtnode-hostname:8080, where mgtnode-hostname is the real management node hostname. c. Log in as a root user. The root user has administrative privileges and maps to the operating system root user. d. After you log in, the Resource Dashboard is displayed in the Web Portal.
Troubleshooting upgrade problems Troubleshooting problems that occur when upgrading to the new release of IBM Platform HPC.
60
Installing IBM Platform HPC Version 4.2
To help troubleshoot your upgrade process, you can view the upgrade.log file that is found in the /opt/pcm/log directory. This file logs informational information about the upgrade procedure, and logs any warnings or errors that occur during the upgrade process Common upgrade problems include the following issues: Cannot log in to the Web Portal after upgrading to Platform HPC Version 4.2. To resolve this issue try the following resolutions: – Restart the Web Portal. In most cases, the services that are required to run the Web Portal start automatically. However, if the Web Portal goes down, you can restart services and daemons manually. From the command line, issue the following command: # pcmadmin service restart --service WEBGUI v
v
v
v
Then run the following command from the management node to resolve this issue: /opt/pcm/libexec/pcmmkcert.sh /root/.xcat/keystore_pcm After upgrading to Platform HPC Version 4.2, some pages in the Web Portal do not display or display old data. To resolve this issue, clear your web browser cache and relogin to the Web Portal. After upgrading to Platform HPC Version 4.2, some pages in the Web Portal do not display. Run the following command from the management node to resolve this issue: /opt/pcm/libexec/pcmmkcert.sh /root/.xcat/keystore_pcm If any of the following errors are found in the upgrade.log file that is found in the /opt/pcm/log directory, they can be ignored and no further actions need to be taken. psql:/external-storage-mnt/db-backup/pmc_group_role.data.sql:25: ERROR: permission denied: "RI_ConstraintTrigger_17314" is a system trigger psql:/external-storage-mnt/db-backup/pmc_group_role.data.sql:29: ERROR: permission denied: "RI_ConstraintTrigger_17314" is a system trigger psql:/opt/pcm/etc/upgrade/postupdate/4.2/update-pcmgui-records.sql:7: ERROR: duplicate key value violates unique constraint "ci_purge_register_pkey" DETAIL: Key (table_name)=(pcm_node_status_history) already exists. psql:/opt/pcm/etc/upgrade/postupdate/4.2/update-pcmgui-records.sql:11: ERROR: duplicate key value violates unique constraint "ci_purge_register_pkey" DETAIL: Key (table_name)=(lim_host_config_history) already exists. psql:/opt/pcm/etc/upgrade/postupdate/4.2/update-pcmgui-records.sql:13: ERROR: duplicate key value violates unique constraint "pmc_role_pkey" DETAIL: Key (role_id)=(10005) already exists. psql:/opt/pcm/etc/upgrade/postupdate/4.2/update-pcmgui-records.sql:15: ERROR: duplicate key value violates unique constraint "pmc_resource_permission_pkey" DETAIL: Key (resperm_id)=(11001-5) already exists. psql:/opt/pcm/etc/upgrade/postupdate/4.2/update-pcmgui-records.sql:18: ERROR: duplicate key value violates unique constraint "pmc_role_permission_pkey" DETAIL: Key (role_permission_id)=(10009) already exists.
Rollback to Platform HPC 4.1.1.1 Revert to the earlier version of Platform HPC.
Before you begin Before you rollback to Platform HPC 4.1.1.1, ensure that you have both the Platform HPC 4.1.1.1 ISO and the original operating system ISO. Chapter 10. Upgrading
61
Procedure 1. Reinstall the management node, complete the following steps: a. Record the following management node network settings: hostname, IP
address, netmask, and default gateway. b. Reinstall the original operating system on the management node. Ensure you use the same network settings as the old management node, including: hostname, IP address, netmask, and default gateway. Refer to “Installing and configuring the operating system on the management node” on page 13 on more information on installing an operating system. 2. Install Platform HPC 4.1.1.1, complete the following steps: a. Locate the default silent installation template autoinstall.conf.example in the docs directory in the installation ISO. mount -o loop phpc-4.1.1.1.x86_64.iso /mnt cp /mnt/docs/phpc-autoinstall.conf.example ./phpc-autoinstall.conf b. Edit the silent installation template and set the os_kit parameter to the absolute path for the operating system ISO. vi ./phpc-autoinstall.conf c. Start the installation by running the installation program specifying the silent installation file. /mnt/phpc-installer -f ./phpc-autoinstall.conf 3. Restore settings and database data, complete the following steps: a. Set up the environment: export PATH=${PATH}:/opt/pcm/libexec/ b. Stop Platform HPC services: pcm-upgrade-tool.py services –-stop c. If you created custom metrics in Platform HPC 4.1.1.1, you can manually re-create them. Refer to the "Defining metrics in Platform HPC" section in the Administering Platform HPC guide for more information. d. Restore database data from a previous backup. pcm-upgrade-tool.py restore --database -d /external-storage-mnt/db-backup/ where external-storage-mnt is the backup location on your external storage and db-backup is the location of the database backup. e. Restore configuration files from a previous backup. pcm-upgrade-tool.py restore --files -f /external-storage-mnt/config-backup/20130708-134535.ta where config-backup is the location of the configuration file backup. 4. Restart Platform HPC services: pcm-upgrade-tool.py services --reconfig 5. Reinstall compute nodes, if needed. If the compute nodes have Platform HPC 4.1.1.1 installed, recover the SSH connection for all compute nodes: xdsh noderange -K v
v
62
where noderange is a comma-separated list of nodes or node groups. If the compute nodes have Platform HPC 4.1.1.1 or 4.2 installed, they must be reprovisioned to use Platform HPC 4.2.
Installing IBM Platform HPC Version 4.2
Upgrading entitlement In IBM Platform HPC, you can upgrade your LSF or PAC entitlement file from Express to Standard.
Upgrading LSF entitlement In IBM Platform HPC, you can upgrade your LSF entitlement file from Express to Standard.
Before you begin To upgrade your product entitlement for LSF, contact IBM client services for more details and to obtain the entitlement file.
About this task To upgrade your entitlement, as a root user, complete the following steps on the Platform HPC management node:
Procedure 1. Copy the new entitlement file to the unified entitlement path
(/opt/pcm/entitlement/phpc.entitlement). 2. Restart LSF. lsfrestart 3. Restart the Web Portal. pmcadmin stop pmcadmin start
Results Your LSF entitlement is upgraded to the standard version.
Upgrading PAC entitlement In IBM Platform HPC, after upgrading your Platform Application Center (PAC) entitlement file from Express Edition to Standard Edition, ensure that you are able to connect to the remote jobs console.
Before you begin To upgrade your product entitlement for PAC, contact IBM client services for more details and to obtain the entitlement file.
About this task After you upgrade to PAC Standard, complete the following steps to connect to the remote jobs console.
Procedure 1. Log in to the Web Portal. 2. From the command line, update the vnc_host_ip.map configuration file in the
$GUI_CONFDIR/application/vnc directory. The vnc_host_ip.map file must specify the IP address that is mapped to the host name.
Chapter 10. Upgrading
63
# cat vnc_host_ip.map # This file defines which IP will be use for the host, for example #hostname1=192.168.1.2 system3750=9.111.251.141 3. Kill any VNC server sessions if they exist. vncserver -kill :${session_id} 4. Go to the /opt/pcm/web-portal/gui/work/.vnc/${USER}/ directory. If the VNC sessions files, vnc.console and vnc.session , exist, then delete them. 5. Restart the VNC server. #vncserver :1 ;vncserver :2 6. Restart the Web Portal. 7. Stop the iptables service on the management node. 8. Verify that the remote job console is running. a. Go to the Jobs tab, and click Remote Job Consoles . b. Click Open My Console . c. If you get the following error, then you are missing the VncViewer.jar file. Cannot find the required VNC jar file: /opt/pcm/web-portal/gui/3.0/tomcat/webapps/platform/pac/vnc/lib/VncViewer.jar. For details about configuring remote consoles, see "Remote Console".
To resolve this error, copy the VncViewer.jar file to the
/opt/pcm/web-portal/gui/3.0/tomcat/webapps/platform/pac/vnc/lib directory. Issue the following command: #cp /opt/pcm/web-portal/gui/3.0/tomcat/webapps/platform/viewgui/common/applet/VncViewer.jar /opt/pcm/web-portal/gui/3.0/tomcat/webapps/platform/pac/vnc/lib/VncViewer.jar
Results Using PAC Standard Edition, you are able to connect to the remote jobs console.
64
Installing IBM Platform HPC Version 4.2
Chapter 11. Applying fixes Check for any new fixes that can be applied to your Platform HPC installation. Note: In a high availability environment, ensure that the same fixes are applied on the primary management node and the failover node
About this task Fixes are available for download from IBM Fix Central website. Note: In a high availability environment, ensure that the same fixes are applied on the primary management node and the failover node.
Procedure 1. Go to IBM Fix Central. 2. Locate the product fixes, by selecting the following options: a. Select Platform Computing as the product group. b. Select Platform HPC as the product name. c. Select 4.2 as the installed version. d. Select your platform. 3. Download each individual fix. 4. Apply the fixes from the command line. a. Extract the fix tar file. b. From the directory where the fix files are extracted to, run the installation
script to install the fix.
© Copyright IBM Corp. 1994, 2014
65
66
Installing IBM Platform HPC Version 4.2
Chapter 12. References Configuration files High availability definition file High availability definition file specifies values to configure high availability.
High availability definition file The high availability definition file specifies values to configure a high availability environment. virtualmn-name: nicips.eth0:0=eth0-IP-address nicips.eth1:0=eth1-IP-address sharefs_mntp.work=work-directory sharefs_mntp.home=home-directory virualmn-name: Specifies virtual node name of the active management node, where virualmn-name is the name of the virtual node.
The virtual node name must be a valid node name. It cannot be a fully qualified domain name, it must be the short name without the domain name. This line must end with a colon (:).
nicips.eth0:0=eth0-IP-address Specifies the virtual IP address of a virtual NIC connected to the management node, where eth0-IP-address is an IP address. For example: nicips.eth0:0=172.20.7.5 Note: A virtual NIC does not need to be created and the IP address does not need to be configured. The pcmhatool command automatically creates the needed configurations.
nicips.eth1:0=eth1-IP-address Specifies the virtual IP address of a virtual NIC connected to the management node, where eth1-IP-address is an IP address. For example: nicips.eth1:0=192.168.1.5 Note: A virtual NIC does not need to be created and the IP address does not need to be configured. The pcmhatool command automatically creates the needed configurations.
sharefs_mntp.work=work-directory Specifies the shared storage location for system work data, where work-directory is the shared storage location. For example: 172.20.7.200:/export/data . If the same shared directory is used for both user home data and system work data, specify this parameter as the single shared directory. Only NFS is supported.
© Copyright IBM Corp. 1994, 2014
67
sharefs_mntp.home=home-directory Specifies the shared storage location for user home data, where home-directory is the shared storage location. For example: 172.20.7.200:/export/home . If the same shared directory is used for both user home data and system work data, do not specify this parameter. The specified sharefs_mntp.work parameter, is used as the location for both user home data and system work data. Only NFS is supported.
Example The following is an example of a high availability definition file: # A virtual node name virtualmn: # Virtual IP address of a virtual NIC connected to the management node. nicips.eth0:0=192.168.0.100 nicips.eth1:0=172.20.7.100 # Shared storage for system work data sharefs_mntp.work=172.20.7.200:/export/data # Shared storage for user home data sharefs_mntp.home=172.20.7.200:/export/home
Commands pcmhatool an administrative command interface to manage a high availability environment
Synopsis pcmhatool [ -h | --help ] | [-v | --version ] pcmhatool subcommand [options]
Subcommand List pcmhatool config -i | --import HAINFO_FILENAME -s | --secondary SMN_NAME [-q | --quiet ] [-h | --help] pcmhatool reconfig -s|--standby SMN_NAME [-q|--quiet] [-h|--help] pcmhatool info [-h|--help] pcmhatool failto -t|--target SMN_NAME [-q|--quiet] [-h|--help] pcmhatool failmode -m|--mode FAILOVER_MODE [-h|--help] pcmhatool status [-h|--help] pcmhatool check [-h|--help]
68
Installing IBM Platform HPC Version 4.2
Description The pcmhatool command manages a high availability environment. It is used to enable high availability, display settings, set the failover mode, trigger a failover, and show high availability data and running status.
Options -h | --help Displays the pcmhatool command help information.
-v | --version Displays the pcmhatool command version information.
Subcommand Options config -i HAINFO_FILENAME -s SMN_NAME Specifies high availability settings to be used to enable high availability between the primary management node and the secondary management node, where HAINFO_FILENAME is the high availability definition file and SMN_NAME is the name of the secondary management node.
-i|--import HAINFO_FILENAME Specifies the import file name of the high availability definition file, where HAINFO_FILENAME is the name of the high availability definition file.
-s|--secondary SMN_NAME Specifies the secondary management node name, where SMN_NAME is the name of the secondary management node.
reconfig -s|--standby SMN_NAME Enables high availability on the standby management node after the management node is reinstalled, where SMN_NAME is the name of the standby management node.
info Displays high availability settings, including: the virtual IP address, the management node name, and a list of shared directories.
failto -t|--target SMN_NAME Sets the specified standby management node to an active management node, where SMN_NAME is the current standby management node.
failmode -m|--mode FAILOVER_MODE Sets the failover mode, where FAILOVER_MODE is set to auto for automatic failover or manual for manual failover. In automatic mode, the standby node takes over the cluster when it detects the active node has failed. In manual mode, the standby node only takes over the cluster if the pcmhatool failto command is issued.
status Displays the current high availability status, including: state of the nodes, failover mode and status of running services. Nodes that are in unavail state are unavailable and indicate a node failure or lost network connection.
check Displays high availability diagnostic information related to the high availability environment, including current status data, failure and correction data.
Chapter 12. References
69
70
Installing IBM Platform HPC Version 4.2
Notices This information was developed for products and services offered in the U.S.A. IBM may not offer the products, services, or features discussed in this document in other countries. Consult your local IBM representative for information on the products and services currently available in your area. Any reference to an IBM product, program, or service is not intended to state or imply that only that IBM product, program, or service may be used. Any functionally equivalent product, program, or service that does not infringe any IBM intellectual property right may be used instead. However, it is the user's responsibility to evaluate and verify the operation of any non-IBM product, program, or service. IBM may have patents or pending patent applications covering subject matter described in this document. The furnishing of this document does not grant you any license to these patents. You can send license inquiries, in writing, to: IBM Director of Licensing IBM Corporation North Castle Drive Armonk, NY 10504-1785 U.S.A. For license inquiries regarding double-byte character set (DBCS) information, contact the IBM Intellectual Property Department in your country or send inquiries, in writing, to: Intellectual Property Licensing Legal and Intellectual Property Law IBM Japan Ltd. 1623-14, Shimotsuruma, Yamato-shi Kanagawa 242-8502 Japan The following paragraph does not apply to the United Kingdom or any other country where such provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS PUBLICATION “AS IS” WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of express or implied warranties in certain transactions, therefore, this statement may not apply to you. This information could include technical inaccuracies or typographical errors. Changes are periodically made to the information herein; these changes will be incorporated in new editions of the publication. IBM may make improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time without notice. Any references in this information to non-IBM Web sites are provided for convenience only and do not in any manner serve as an endorsement of those Web sites. The materials at those Web sites are not part of the materials for this IBM product and use of those Web sites is at your own risk. © Copyright IBM Corp. 1994, 2014
71
IBM may use or distribute any of the information you supply in any way it believes appropriate without incurring any obligation to you. Licensees of this program who wish to have information about it for the purpose of enabling: (i) the exchange of information between independently created programs and other programs (including this one) and (ii) the mutual use of the information which has been exchanged, should contact: IBM Corporation Intellectual Property Law Mail Station P300 2455 South Road, Poughkeepsie, NY 12601-5400 USA Such information may be available, subject to appropriate terms and conditions, including in some cases, payment of a fee. The licensed program described in this document and all licensed material available for it are provided by IBM under terms of the IBM Customer Agreement, IBM International Program License Agreement or any equivalent agreement between us. Any performance data contained herein was determined in a controlled environment. Therefore, the results obtained in other operating environments may vary significantly. Some measurements may have been made on development-level systems and there is no guarantee that these measurements will be the same on generally available systems. Furthermore, some measurement may have been estimated through extrapolation. Actual results may vary. Users of this document should verify the applicable data for their specific environment. Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not tested those products and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products. All statements regarding IBM's future direction or intent are subject to change or withdrawal without notice, and represent goals and objectives only. This information contains examples of data and reports used in daily business operations. To illustrate them as completely as possible, the examples include the names of individuals, companies, brands, and products. All of these names are fictitious and any similarity to the names and addresses used by an actual business enterprise is entirely coincidental. COPYRIGHT LICENSE: This information contains sample application programs in source language, which illustrates programming techniques on various operating platforms. You may copy, modify, and distribute these sample programs in any form without payment to IBM, for the purposes of developing, using, marketing or distributing application programs conforming to the application programming interface for the operating platform for which the sample programs are written. These examples have not been thoroughly tested under all conditions. IBM, therefore, cannot guarantee or
72
Installing IBM Platform HPC Version 4.2
imply reliability, serviceability, or function of these programs. The sample programs are provided "AS IS", without warranty of any kind. IBM shall not be liable for any damages arising out of your use of the sample programs. Each copy or any portion of these sample programs or any derivative work, must include a copyright notice as follows: © (your company name) (year). Portions of this code are derived from IBM Corp. Sample Programs. © Copyright IBM Corp. _enter the year or years_. If you are viewing this information softcopy, the photographs and color illustrations may not appear.
Trademarks IBM, the IBM logo, and ibm.com ® are trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at "Copyright and trademark information" at http://www.ibm.com/legal/copytrade.shtml. Intel, Intel logo, Intel Inside, Intel Inside logo, Intel Centrino, Intel Centrino logo, Celeron, Intel Xeon, Intel SpeedStep, Itanium, and Pentium are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Java and all Java-based trademarks and logos are trademarks or registered trademarks of Oracle and/or its affiliates. Linux is a trademark of Linus Torvalds in the United States, other countries, or both. LSF, Platform, and Platform Computing are trademarks or registered trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both. Other company, product, or service names may be trademarks or service marks of others.
Privacy policy considerations IBM Software products, including software as a service solutions, (“Software Offerings”) may use cookies or other technologies to collect product usage information, to help improve the end user experience, to tailor interactions with the end user, or for other purposes. In many cases no personally identifiable information is collected by the Software Offerings. Some of our Software Offerings can help enable you to collect personally identifiable information. If this Software Offering uses cookies to collect personally identifiable information, specific information about this offering’s use of cookies is set forth below. Depending upon the configurations deployed, this Software Offering may use session and persistent cookies that collect each user’s user name, for purposes of session management. These cookies cannot be disabled. Notices
73
If the configurations deployed for this Software Offering provide you as customer the ability to collect personally identifiable information from end users via cookies and other technologies, you should seek your own legal advice about any laws applicable to such data collection, including any requirements for notice and consent. For more information about the use of various technologies, including cookies, for these purposes, see IBM’s Privacy Policy at http://www.ibm.com/privacy and IBM’s Online Privacy Statement at http://www.ibm.com/privacy/details the section entitled “Cookies, Web Beacons and Other Technologies” and the “IBM Software Products and Software-as-a-Service Privacy Statement” at http://www.ibm.com/software/info/product-privacy.
74
Installing IBM Platform HPC Version 4.2