SeisSpace System Administration
Other Docs Other Docs
Known Problems
2
SeisSpace System Administration
Configuring SeisSpace Configuring some General properties SeisSpace has a master configuration file similar to the ProMAX config_file where an administrator can set certain properties for the site installation. The master configuration file for SeisSpace is the PROWESS_HOME/etc/prowess.properties file. The administrator may want to edit this file to set some installation defaults and them make it write only by that administrative user. It may also be useful to move the PROWESS_HOME/etc directory to the install directory /apps/SSetc similar to how you would copy the PROMAX_HOME/etc directory out of the install so that your configuration settings do not get deleted if you were to reinstall the product. You can point to the external etc directory using the environment variable PROWESS_ETC_HOME set in the client startup environment or script. See PROWESS_HOME/etc/SSclient for an example.
Product lists The administrator can set up the list of Products that are available. The list includes ProMAX 2D, ProMAX 3D, ProMAX 4D, ProMAX VSP, ProMAX Field, ProMAX DEV and DepthCharge. There is a stanza in the /apps/SeisSpace/etc/prowess.properties file that you can use to control the list of available products that is presented to the users. # Define the comma-separated list of products that are available from # the Navigator. Whether the user is actually able to switch to a product # depends upon whether a license for it is available. If the product name is # preceded by the negation symbol (!), then that product will not be # shown in the Navigator. You may use the Navigator preferences to # change the displayed products on a per-user basis. # # ALL - all product levels # 2D, 3D, 4D, VSP, FIELD, DEPTHCHARGE, DEV
Other Docs Other Docs
Known Problems
3
SeisSpace System Administration
# # The default is to show all products, except for ProMAX Dev # Navigator.availableProductLevels=ALL,!DEV
The default is to show all products except to make ProMAX Dev not visible in the Product pull down list in the Navigator. Saving Settings There is the concept of "shared" information that is the same for all users and managed by an administrator. There is also the concept of "private" information that is only available to an individual user. "Shared" information is stored in the /apps/logs/netdir.xml file and "private" information is stored by default in the user’s /home/user/SeisSpace/.seisspace file, or you can specify a different directory to store the .seisspace file in the SSclient startup script with the PROWESS_PREFS_DIR environment variable. There is a stanza in the /apps/SeisSpace/etc/prowess.properties file that you can use to control how much "private" information the users can have. ################ ADMINISTRATION ################ # These options allow a system administrator to restrict access # of non-administrative users to administrative features. When set # to true, an administrative feature can only be used if a user # has logged in as admin. Note that these options are here in # response to a customer request. onlyAdminCanAddArchiveDataHomes=false onlyAdminCanAddDataHomes=false onlyAdminCanEditHostsList=false onlyAdminCanDefineClusters=false onlyAdminCanInitializeDatabase=true
If the administrator wants to restrict the user’s ability to add their own data homes or hosts or cluster lists, these can be set to true. The options will then be grayed out of the user’s pull down menu and rendered inoperative. NOTE that the administrator will also need to restrict write access to this file. The sitemanager and all clients will need to be restarted for a change to this file to take effect.
Other Docs Other Docs
Known Problems
4
SeisSpace System Administration
Note: The first time you log in as administrator, you may elect to set the administrator password. Only users that know the password can perform administrative functions. The password is stored in the netdir.xml file. If you forget the password and need to reset it you can stop the sitemanager, edit the netdir.xml file and remove the password line, then start the sitemanager again. The administrator password will be "blank" again and you can reset it. Job Submission Node/Joblet Specification The default Queued job submission protocols combine the lisp menu que_res_pbs.menu and some dialogs on the submit GUI for selecting the number of node and number of joblets per node to use for the job. In some installations, customers may implement their own queued job submission menus and scripts. For some of these cases the customer site may wish to disable the nodes and joblet/node count entries in the SeisSpace Job Submit GUI and use their own menu parameters to build up the qsh files. In this case it is possible to hide the GUI based entries by editing the $PROWESS_HOME/etc/prowess.properties file and setting the showJobletsSpec property to false in the following stanza: # Whether or not show the joblet specification (number of nodes, number of # execs per node, number of joblets) in the Submit dialog; the default # is to show it. If you customize your queue menu and submit scripts in such # a way that you take responsibility for how to obtain the number of joblets, # you may want to hide these parameters in the Submit dialog. com.lgc.prodesk.jobviewer.showJobletsSpec=true
Job Submission - Force use of Queued submit A property is available in the PROWESS_HOME/etc/prowess.properties file to force all jobs to be submitted using a Queued submit. This property deactivates the direct job submit icon and removes the host and cluster options from the job submit dialog. To implement this behavior uncomment the line in the prowess.properties file: # Whether or not to restrict job submission to only queues i.e. local and cluster direct submit jobs # are disabled from the UI. Default is false, meaning local,cluster, and queue submits are allowed. # com.lgc.prodesk.jobviewer.showQueuesOnly=true
Other Docs Other Docs
Known Problems
5
SeisSpace System Administration
Job Submission - RMI port count - for more than 1000 simultaneous jobs A property is available in the PROWESS_HOME/etc/prowess.properties file to increase the number of RMI ports to look for if you plan to run more than 1000 jobs simultaneously. # Number of successive ports when creating an RMI registry for communicating # between the Exec and SuperExec. Default is 1000. The minimum is 1000. # Increase if you are running more than 1000 jobs simultaneously. #com.lgc.prowess.exec.NumberRegistryPorts=1000
Changing the default in JavaSeis Data Output to use primary storage The default in the JavaSeis Data Output menu is to use secondary storage for the trace data. You must use the parameter default flow method to set this default. This method is documented in the SeisSpace User Guide Working with the navigator section. Changing the default location to store the .user_headers files for user defined header lists and setting options for user header hierarchy. You can set the default location of where the .user_defined headers file is stored to be either of Data_Home Area, or Line. The installation default is Data_Home working under the thinking that in general as users add headers, they will tend to be the same ones use multiple times for all lines. You may select to set the default storage to be at the LINE level instead. # The default location of the .user_headers file. The .user_headers # files contains user-defined headers. Set to "DataHome","Area", or "Line" (case insensitive). # Default is DataHome. # TODO this should be set from the user preferences [dialog], and whether it # can be changed should be controlled by another property: onlyAdminCanChangeDefaultUserHeaderLocation. com.lgc.prodesk.navigator.defaultUserHeaderLocation=DataHome
You can also set a property to prevent users from having the option to select an alternate location. # Switch to determine if users can store the user headers at a location # other than the above default location. The default is to allow. # If you are logged in as Admin, this will always be true. com.lgc.prodesk.navigator.canChangeUserHeaderLocation=true
Other Docs Other Docs
Known Problems
6
SeisSpace System Administration
Changing the location of the user parameter default flows. You can set the default location of where the user parameter default override flows are stored by setting a property in the prowess.properties file. # You can define a common location under which to create a hierarchy for #user parameter defaulting flows. For example, you can set this to # the value of $PROWESS_HOME/etc or $PROWESS_ETC_HOME if this variable is set. # The default is $HOME/SeisSpace or $PROWESS_PREFS_DIR if this variable is set. # the directory structure under this is .../defaultdatahom/defaultarea/defaultline #com.lgc.prodesk.navigator.userDefaultsDir=$PROWESS_ETC_HOME
Overview Once you have started the SeisSpace Navigator, you can use the Administrative dialogs from the Edit-Administration pull down menus to continue setting up SeisSpace.
Other Docs Other Docs
Known Problems
7
SeisSpace System Administration
Logging in as Administrator 1. Select Edit>Administration>Login As Administrator.
2. Click Set Password.
3. Leave the Old Password line blank and enter your new password twice. Then click OK.
All users will now need to use the new password to gain administrative privileges.
Other Docs Other Docs
Known Problems
8
SeisSpace System Administration
Defining Hosts There are two possible hosts lists, the shared host list set up by the administrator and the personal host list set up by the user. The Hosts list is the list of the machines on your network that can be used to run remote jobs and define clusters for parallel processing. If you define your hosts lists when you are logged on as Administrator, you will define hosts for all of the users (A shared hosts list). Otherwise, you will be defining a personal, or "private" list of hosts. "Shared" information is stored in the /apps/logs/netdir.xml file and "private" information is stored in the user’s /home/user/SeisSpace/.seisspace file. (or PROWESS_PREFS_DIR/.seisspace file) To begin, select Edit > Administrator > Define Hosts. One of the following dialog boxes will appear depending on if you are logged in as the administrator or not: Administrator Shared host list dialog
Other Docs Other Docs
Known Problems
9
SeisSpace System Administration
User personal host list dialog:
Enter the name of the host you’d like to add into the large text box. If you’d like to add a range of hosts that differ by a number prepended or appended name (for example: xyz1, xyz2, xzy3, xyz4, xyz5, etc.), enter the starting host name in the Generate hosts from text field (xyz1) and the ending hostname in the Generate hosts to field (xyz5). When you click Add, all the host names within the range will be generated and added. You can also define hosts with a number embedded in the name. For example: x1yz to x5yz. Remove hosts by deleting their names from the editable list. Click Save and Close to update your hosts list or Cancel to exit without saving. The /apps/logs/netdir.xml file for shared lists or the users homedir/SeisSpace/.seisspace file will be updated for a private host list. This is also the list of hosts that will be shown in the job submit user interface
Defining Clusters There are two possible cluster lists, the shared cluster list set up by the administrator and the personal cluster list set up by the user. Other Docs Other Docs
Known Problems
10
SeisSpace System Administration
A cluster is a logical name for a group of hosts to which you can distributed submit jobs. If you define the clusters when you are logged on as Administrator, you will define cluster definitions for all of the users. Otherwise, you will be defining a personal, or "private" cluster definition. "Shared" information is stored in the /apps/logs/netdir.xml file and "private" information is stored in the user’s /home/user/SeisSpace/.seisspace file. (or PROWESS_PREFS_DIR/.seisspace file) Duplicate names are not managed by SeisSpace. The cluster list to choose from in the job submit user interface is a concatenation of the shared and the personal lists. The shared cluster definitions are indicated as shared by the check box. To begin, select Edit > Administrator > Define Clusters. One of the following dialog boxes will appear depending on if you are logged in as the administrator or not: The general steps for adding a cluster are:
Other Docs Other Docs
Known Problems
11
SeisSpace System Administration
1. Enter the cluster name in the New: text box. 2. Click Add. 3. Enter starting and ending hosts information and click Add to generate the list of hosts. 4. Click Save. Below is an example after defining clusters for a shared clusters list:
Other Docs Other Docs
Known Problems
12
SeisSpace System Administration
Below is an example after defining clusters for a personal clusters list :
Enter the name of the cluster you’d like to add in the New text field and click Add. It will be added to the pulldown lists of clusters. Then create a list of hosts for the cluster by editing directly into the large text box. If you’d like to add a range of hosts that differ by a number prepended or appended name (for example: xyz1, xyz2, xzy3, xyz4, xyz5, etc.), enter the starting host name in the Generate hosts from text field (xyz1) and the ending hostname in the Generate hosts to field (xyz5). When you click Add, all the hosts names within the range will be added. You can also define hosts with a number embedded in the name. For example: x1yz to x5yz. Remove hosts by deleting their names from the editable list.
Other Docs Other Docs
Known Problems
13
SeisSpace System Administration
To edit or remove an existing cluster, select it from the pulldown list of clusters. Click Save and Close to update your hosts list or Cancel to exit without saving.
Adding a Data Home A Data Home directory is the equivalent of ProMAX Primary Storage directory. There are two possible Data_home lists, the shared list set up by the administrator with details stored in the netdir.xml file and the personal list set up by the user with details stored in the users .seisspace file. CAUTION: - avoid declaring the same DATA_HOME in both the shared (logs/netdir.xml) file and your personal .seisspace files. A DATA_HOME should only be specified in one location. If you do end up with duplicates, you will be prompted with some options of how to resolve the duplication.
Other Docs Other Docs
Known Problems
14
SeisSpace System Administration
1. Begin by selecting Edit>Administration>Add Data Home. The Add new Data Home dialog box appears.
Enter or select a pathname to the project you are adding. This path must be accessible from all nodes in the cluster by exactly the same pathname. The pathname is equivalent to ProMAX primary storage where the project/subproject hierarchy that is shown is equivalent to the ProMAX Area/Line hierarchy in PROMAX_DATA_HOME.
Other Docs Other Docs
Known Problems
15
SeisSpace System Administration
2. If you wish, enter a name which SeisSpace will use to label the Data Home. The idea here is that you may want to address a project by a logical name instead of by a directory name. Your actual disk directory may have a name similar to /filesystem/disk/primarydir1/secondarydir2/group1/person2/promax_d ata/marine_data. It may be easier to address this data as simply "marine data" from the navigator. (MB3 > Properties can be used to show the entire path to the aliased name for reference.) 3. If you wish, enter a character string that will be used as an additional directory prefix for JavaSeis datasets in secondary storage. DO NOT use blanks, or slashes or other special characters, If JavaSeis Secondary storage is specified as /a/b/c, the datasets for this data_home will use directory /a/b/c/this_prefix/area/line/dataset. If you leave this entry blank, the datasets for this data_home will use directory /a/b/c/area/line/dataset. This feature is designed to prevent potential dataset overwriting in the case where you have the same area/line in multiple data_homes using the same secondary storage directories. 4. ProMAX Environment Variable Editor. At a minimum it is recommended that you specify values for PROMAX_SCRATCH_HOME and PROMAX_ETC_HOME. Select the variable and then click Edit to modify the settings. You may add other variables here. Typical entries may be PROMAX_MAP_COMPRESSION, or extended scratch partitions. You can consult the ProMAX system administration guide for the list of environment variables. It is generally recommended to avoid having the same data home specified as a shared project and have users specify it as a personal project. It is possible to do this but you will get into situations where the projects are not updated concurrently and you will get confused. There are also some dialogs in place to help resolve the duplicate entries.
Other Docs Other Docs
Known Problems
16
SeisSpace System Administration
5. Click the checkbox for This data home should be visible to all users if you’d like all users to be able to access this data home. Note: this option is only visible if you are logged in as the administrator. A completed Data Home dialog should look similar to the following example:
Other Docs Other Docs
Known Problems
17
SeisSpace System Administration
JavaSeis Secondary Storage This option is used to set up a list of file systems to use for JavaSeis dataset secondary storage. If you don’t do anything, JavaSeis datasets will use the same file systems as ProMAX datasets for secondary storage as defined in the etc/config_file. If this is the desired behavior then you do not need to pursue the JavaSeis Secondary Storage configuration. You will need to make sure that you don’t have a dir.dat file in any of the standard search paths. A dir.dat file with lines with the #SS#/directory syntax will take precedence over the etc/config_file. When the JavaSeis Secondary Storage dialog is first started all of the text boxes may be blank. For the top text box to be populated you must have a dir.dat file in one of the following possible locations with #SS# lines in it: • • • •
PROWESS_DDF (direct path to dir.dat file) OW_DDF (direct path to dir.dat file) OW_PMPATH/dir.dat OWHOME/conf/dir.dat
If you use the default for OWHOME, SeisSpace will use $PROMAX_HOME/port/OpenWorks in a standard non-OpenWorks ProMAX/SeisSpace installation. If you want to specify a different location you can set either OW_PMPATH, OW_DDF or PROWESS_DDF in your SSclient startup script. An example of a dir.dat file can be found in your SeisSpace installation $PROWESS_HOME/etc/conf directory. This file is shown below for reference:
Other Docs Other Docs
Known Problems
18
SeisSpace System Administration
# Example of lines in a dir.dat file that SeisSpace understands for specifying optional # secondary storage for JavaSeis datasets. # ########################################################## ########################################################## #SS#/d1/SeisSpace/js_virtual,READ_WRITE #SS#/d2/SeisSpace/js_virtual,READ_WRITE #SS#/d3/SeisSpace/js_virtual,READ_WRITE #SS#/d4/SeisSpace/js_virtual,READ_WRITE #SS#GlobalMinSpace=209715200 #SS#MinRequiredFreeSpace=209715200 #SS#MaxRequiredFreeSpace=107374182400 # # ########################################################## ################ Documentation below ##################### ########################################################## # # The SeisSpace navigator will optionaly search for a file in the data_home directory # defined by the environment variable: # JAVASEIS_DOT_SECONDARY_OVERRIDE # This is a method where an administrator can change the secondary storage # specification for a DATA_HOME for testing to do things like test new disk partitions # before putting them into production without affecting the users. # # In production mode , the SeisSpace navigator will first search for a .secondary file # in the data_home directory: # # The SeisSpace navigator will first search for a .secondary file in the data_home directory: # # first: $DATA_HOME/.secondary (which is managed as part of the data_home properties) # # if no .secondary file exists, the next search will be for a dir.dat file using the following hiera: # # second: $PROWESS_DDF (direct path to dir.dat file) # third: $OW_DDF (direct path to dir.dat file) # fourth: $OW_PMPATH/dir.dat # fifth: $OWHOME/conf/dir.dat (Note that OWHOME=PROMAX_HOME/port/OpenWorks in # a standard non-OpenWorks ProMAX/SeisSpace installation) # # if no dir.dat file is found, JavaSeis seconday storage will use the secondary storage definition # in the PROMAX_ETC_HOME/config_file # # sixth: ProMAX secondary storage listed in the config_file for the project # # In the first dir.dat file that is found, the file is checked to see if # SeisSpace secondary storage has been defined. The expected format is; # ##SS#/d1/SeisSpace/js_virtual,READ_WRITE ##SS#/d2/SeisSpace/js_virtual,READ_WRITE ##SS#/d3/SeisSpace/js_virtual,READ_WRITE ##SS#/d4/SeisSpace/js_virtual,READ_WRITE
Other Docs Other Docs
Known Problems
19
SeisSpace System Administration
# # GlobalMinSpace --> A global setting used by the Random option, do not use this # folder if there is less than GlobalMinSpace available # (value specified in bytes -- 209715200 bytes = 200Mb) ##SS#GlobalMinSpace=209715200 # # In this example 4 secondary storage locations are specified all with RW status # with a global minimum disk space requirement of 200 Mb. # # Other attributes can be associated with the different directories: # READ_WRITE --> available for reading existing data and writing new data # READ_ONLY --> available for reading existing data # OVERFLOW_ONLY --> avaliable as emergency backup disk space that is only used # if all file systems with READ_WRITE status are full # # The data_home properties dialog can be used to make a .secondary file # at the Data_Home level which will be used first. # # There are two different policies that can be used to distribute the data over the file # systems (folders) specified above: RANDOM and MIN_MAX. # #PolicyRandom # Retrieve the up to date list of potential folders for secondary. # # From the list of potential folders get those that have the READ_WRITE # attribute. # # If the list contains more than 0, generate a random number from 1 to N # (where N=the number of folders) and return that folder index to be used. # # If the list of READ_WRITE folders is 0 then get the list of "OVERFLOW_ONLY" folders. # If the list contains more than 0, generate a random selection of the folder index # and return that folder index to be used. # # If there are O READ_WRITE folders and 0 OVERFLOW_ONLY folders then the job will fail. # #PolicyMinMax # Uses the following values: # # MinRequiredFreeSpace --> Do not use this folder in the MIN_MAX policy if there is # less than MinRequiredFreeSpace available # (value specified in bytes -- 209715200 bytes = 200Mb) # MaxRequiredFreeSpace --> Use this folder multiple times in the MIN_MAX policy if # there is more than MaxRequiredFreeSpace available. # (value specified in bytes -- 107374182400 bytes = 100Gb) # ##SS#MinRequiredFreeSpace=209715200 ##SS#MaxRequiredFreeSpace=107374182400 # # Get the list of potential folders and computes the free space on each folder. # # for each folder in the list that has a READ_WRITE attribute check the free space. # - If the free space is less than MinRequiredFreeSpace exclude it. # (Not enough free space on this disk)
Other Docs Other Docs
Known Problems
20
SeisSpace System Administration
# # # # # # # # # #
- If the free space is greater than MinRequiredFreeSpace and less than MaxRequired add it to the list of candidates - If the free space is also greater than MaxRequiredFreeSpace add it as a candidate again. This will weight the allocation to disks with the most free space. From the list of candidates use the same random number technique as above. If there are no folders in the list of candidates then check for any possible overflow folders. If folders are found use the random number technique to return an overflow folder. If we don’t have anything in overflow we fail.
If you use the example file located in the $PROMAX_HOME/port/OpenWorks/conf directory, and don’t explicitly set OWHOME in the startup script, this dialog would look as shown below:
Other Docs Other Docs
Known Problems
21
SeisSpace System Administration
The top section shows the directories found in the dir.dat file. Note: the attributes (RO vs. RW) are not editable or shown here. The second section shows the directory and attributes contents of a .secondary file in this Data_Home directory if one exists. The bottom section shows the Min/Max disk space usage policy settings in the .secondary file if one exists. The defaults are shown if the .secondary file does not exist. The defaults are 200 Mb for the mins and 100 Gb for the max. (More detail on these above in the example dir.dat file example shown) If you do nothing else, JavaSeis datasets will use all of the file systems listed in the dir.dat for secondary storage and distribution of the file extents. You can also select a subset of these file systems on a "per Data Home" basis so that different Data Homes can use different subsets of the file systems listed in the dir.dat file. To do this, click MB1 on the file systems you want to use for this Data Home in the top window and they will show up in the lower window. You can choose to add attributes to the file systems. Multiple directories can be chosen with the standard MB1, CNTRL-MB1 and SHFT-MB1 mouse and keyboard bindings. Read Write: The default configuration allows for datasets to be both read and written. Read Only: Do not write any new data to the selected file system(s). Overflow: Only write data as an emergency backup if all of the other file systems are full. This is designed to be used as an emergency backup so that jobs don’t fail when the main secondary disk fill up. Remove: Remove the selected file system(s) from the list. You can choose to set the min and max disk usage policy settings in the .secondary file. The policy is chosen in the JavaSeis Data Output menu. Min/Max Policy - Minimum (Mb) There must be at least this much disk space available before any extents are written to this folder. Min.Max Policy - Maximum (MB)
Other Docs Other Docs
Known Problems
22
SeisSpace System Administration
If there is more disk space available than this value, this folder is added to the list of available folders twice. Min free space required for Random policy (Mb) There must be at least this much disk space available before any extents are written to this folder. Click on the Update button(s) to set the configuration for this Data Home. This configuration for the Data Home is stored in two files in the Data Home directory. The .properties file will store all the properties from the main properties dialog and the .secondary file will store the list of file systems and the min/max policies to use for JavaSeis secondary storage. If you delete all of the directories in the lower window and update, the .secondary file will be deleted.
.secondary file OVERRIDE For testing a new filer, or secondary storage disk configuration, an administrator may want to temporarily override the production .secondary file and use a test version. The administrator can do this by making a copy of the .secondary file in the data_home directory and pointing to the temporary copy with an environment variable. In a shell, cd to the data_home directory and copy the .secondary file and manually edit it. cp .secondary .secondary_test vi .secondary_test
IF you have set the environment variable JAVASEIS_DOT_SECONDARY_OVERRIDE in the navigator start up script, or your user environment, then the file it points to MUST EXIST in the data home that you are working in. If not the IO will refer back to the original .secondary or the highest level dir.dat file that it finds. IF the file defined as env variable JAVASEIS_DOT_SECONDARY_OVERRIDE exists THEN when you open the JavaSeis secondary folder configuration dialog for a data home It will show the contents of that file and allow you to edit it by adding directories in from the dir.dat list. You cannot add in other lines manually from the GUI. ELSE IF the file does not exist
Other Docs Other Docs
Known Problems
23
SeisSpace System Administration
THEN you will be shown a blank area in the .secondary edit part of the GUI where you can repopulate it from the directories listed in the dir.dat file that have the #SS# prefix. When you update, the file will be created. IF the variable is NOT set, the system will use the standard .secondary file preferentially. The IO code is updated so that if the variable is set it will use that file for the secondary specification. In a data home you may see a .secondary plus a .secondary_test file as an example. IF the .secondary_test file does not exist then the IO will use the standard .secondary even if the env variable is set to .secondary_test If you want to use the test file, you will need to set JAVASEIS_DOT_SECONDARY_OVERRIDE to .secondary_test
Verifying Projects in the Navigator In the Navigator, click on data home folder and then navigate the tree to see projects (AREAS), subprojects (LINES), and Flows, Tables and Datasets.
Other Docs Other Docs
Known Problems
24
SeisSpace System Administration
Removing Data Homes To remove a data home from your SeisSpace Navigator first select the Project folder in the tree view and then select Edit > Administration > Remove Data Home from the Navigator pulldown menu. Removing a data home does not delete the data it only removes it from the list of data homes defined in SeisSpace.
Other Docs Other Docs
Known Problems
25
SeisSpace System Administration
Configuring the User’s MPI environment A .mpd.conf file must exist in each user’s home directory. If one does not exist, the software will automatically generate it. If you want to set this up manually you can do the following: Create a $HOME/.mpd.conf file for each user. This file can be identical for all users, or each user can have their own "secret word". Note: This is "dot" mpd "dot"conf. The requirements for the .mpd.conf file are: • this file exist in the user’s home directory, • is owned by the user, and • has permissions of 600 (rw for user only). The file can be created with the following two commands: $ echo "MPD_SECRETWORD=xyzzy" > $HOME/.mpd.conf $ chmod 600 $HOME/.mpd.conf
after the file is created with the line of text and the permissions set, you should see the following in the user’s home directory: [user1@a1 user1]$ ls -al .mpd.conf -rw-------
1 user1
users
19 Jun 23 13:38 .mpd.conf
[user1@a1 user1]$ cat .mpd.conf MPD_SECRETWORD=xyzzy
Note: There are some cases where you may have rsh problems related to a mismatch between a kerberos rsh and the normal system rsh. The system searches to see if /usr/kerberos/bin is in your path. If the /etc/profile.d.krb5.csh does not find this in your path, the script will prepend it to your path. To avoid this, add /usr/kerberos/bin to the end of your path.
Other Docs Other Docs
Known Problems
26
SeisSpace System Administration
Routing Issues A special routing problem can occur if a Linux cluster mayor or manager node has two ethernet interfaces: one for an external address and one for an internal cluster IP address. If the mayor's hostname corresponds to the external address, then the machine misidentifies itself to other cluster nodes. Those nodes will try to route through the external interface.
Quick Fix You can use the internal address of the mayor node as the external address of the mayor node. % route add -net 146.27.172.254 netmask 255.255.255.255 gw 172.16.0.1
where 172.16.0.1 is the internal IP address of the mayor node and 146.27.172.254 is the external address of the mayor node.
Better Fix Set the route on all cluster nodes to use the internal address of the mayor for any unknown external address: % route add -net 0.0.0.0 netmask 0.0.0.0 gw 172.16.0.1
This fix makes the previous fix unnecessary.
Adding Routes Outside machines might not have a route to the cluster nodes. To add a route to a PC needing a cluster node, set the route to use the external address of the mayor node to all cluster node addresses: % route add 172.16.0.0 mask 255.255.0.0 146.27.172.254
where 172.16.0.0 with a mask of 255.255.0.0 specifies the address range of the cluster nodes and 146.27.172.254 is the external address of the cluster mayor node.
Other Docs Other Docs
Known Problems
27
SeisSpace System Administration
Diagnosing routing problems To diagnose problems with routing on a cluster, check the following information on the mayor node and on a worker node. You must have direct routes to all other nodes: % % % %
route route -n netstat -r netstat -rn
Make sure your nodes IP address is associated with the ethernet interface. % ifconfig -a
Hardwire the correct association of IP addresses with hostnames. Use the same file for all nodes, including the mayor. % cat /etc/hosts
See how hostnames are looked up: % cat /etc/nsswitch.conf % cat /etc/resolv.conf
Use the lookup order hosts: files nis dns. If you are not using DNS, then /etc/resolv.conf must be empty. If you are using DNS, then the following lines must be present: nameserver
search domain.com domain2.company.com
Cluster configuration considerations When you get ready to set up a cluster you need to consider what application components will be running on which components of the cluster. For a cluster that is meant to primarily run ProMAX and SeisSpace you can use the following recommendations. For other uses, you will have to adapt these recommendations appropriately. The main consideration is to not overload any particular component of the cluster. For example, it is very easy to overload the Manager node with a Other Docs Other Docs
Known Problems
28
SeisSpace System Administration
variety of cluster administration daemons as well as a variety of user processes. For a ProMAX and SeisSpace installation you may want to segregate the work as follows: You may decide to run the following on the Manager: • the PBS/Torque server and scheduler • the FlexLM license manager • the SeisSpace sitemanager You may decide to use a couple of the nodes as user "login" nodes to run: • the SeisSpace User Interface / Flow Builders • Interactive/direct submit ProMAX, Hybrid and SeisSpace jobs You should only run the following on the "invisible" cluster nodes: • PBS - mom • ProMAX, Hybrid and SeisSpace jobs released from the queue or jobs directed to run on those nodes.
Additional Considerations In addition to the above, you will need to ensure that the manager node and the "login" nodes are set up with multiple IP addresses so that they are visible on both networks. The internal cluster network and the external user network. Running jobs on the manager should generally be avoided so that this node can be available to do the system management work that it is intended to do. You want to avoid having a PBS-mom running on the "login" node(s) to prevent jobs from the queue from running on these nodes. The "login" node should be reserved for interactive display jobs and small direct submit test jobs.
Other Docs Other Docs
Known Problems
29
SeisSpace System Administration
Managing Batch Jobs using Queues Managing batch jobs for seismic data processing via queues provides the following benefits: • sequential release of serially dependent jobs • parallel release of groups of independent jobs • optimized system performance by controlling resource allocation • centralized management of system workload
Introduction to Batch Job Queues Seismic data processing using SeisSpace or ProMAX on an individual workstation or a Linux cluster can benefit from using a flexible batch queuing and resource management software package. Batch queueing software generally has three components; a server, a scheduler, and some sort of executor (Mom). A generic diagram showing the relationship between the various components of the Torque queuing software is illustrated below.
qmgr commands
ProMAX SeisSpace UI
Torque Server
Torque Mom
Torque Scheduler
Other Docs Other Docs
Known Problems
30
SeisSpace System Administration
Generic Queued Job Workflow 1. A job is submitted to the queuing system server via a command like "qsub". 2. The server communicates with the scheduler and requests the number of nodes the job needs. 3. The scheduler gathers current node or workstation resource utilization and reports back to the server which nodes to use. 4. The server communicates with the mom(s) to start the job on the node(s) allocated. Note that a single Linux workstation has one mom daemon, as the diagram above shows, but the diagram for a Linux cluster can have hundreds to thousands of compute nodes with one mom on each. Torque and SGE (Sun Grid Engine) are typical of the available queuing packages. For this release we tested and documented batch job queuing using Torque. This package can be freely downloaded from http://www.clusterresources.com/downloads/torque.
Other Docs Other Docs
Known Problems
31
SeisSpace System Administration
Torque Installation and Configuration Steps 1. Download and install Torque source code 2. Set torque configuration parameters 3. Compile and link the Torque source code 4. Install the Torque executables and libraries 5. Configure the Torque server and mom 6. Test Torque Queue Submission 7. Start Torque server, scheduler, and mom at boot 8. Build the Torque packages for use in installing Torque on cluster compute nodes, then install these packages 9. Integrate ProMAX and SeisSpace with Torque 10. Recommendations for Torque queues
Download and Install Torque Source Code Landmark does not distribute Torque so you will have to download the latest source tar bundle. which looks similar to torque-xx.yy.zz, from the following URL: http://www.clusterresources.com/downloads/torque The latest version of Torque we tested is 2.3.3. on a RedHat 4 Update 5 system. Note: PBS and Torque are used interchangeably throughout this document. As the root user, untar the source code for building the Torque server, scheduler, and mom applications. > mkdir /apps/torque > cd /apps/torque > tar -zxvf /torque-xx.yy.zz.tar.gz > cd torque-xx.yy.zz
If you decide you want to build the Torque graphical queue monitoring utilities (recommended) xpbs and xpbsmon, there are some requirements. Make sure tcl, tclx, tk, and their devel rpm’s are installed for the architecture type of your system, such as i386 or x86_64. Since the tclOther Docs Other Docs
Known Problems
32
SeisSpace System Administration
devel-8.*.rpm and tk-devel-8.*.rpm files may not be included with several of the RHEL distributions, you may need to download them. There may be other versions that work as well. Any missing RPM’s will need to be installed. Here is an example of required RPM’s from a RHEL 4.5 x86_64 installation: [root@sch1 prouser]# rpm -qa | grep tcl-8 > tcl-8.4.7-2 [root@sch1 prouser]# rpm -qa | grep tcl-devel-8 > tcl-devel-8.4.7-2 [root@sch1 prouser]# rpm -qa | grep tclx-8 > tclx-8.3.5-4 [root@sch1 prouser]# rpm -qa | grep tk-8 > tk-8.4.7-2 [root@sch1 prouser]# rpm -qa | grep tk-devel-8 > tk-devel-8.4.7-2
Here is an example of required RPM’s from a RHEL 5.2 x86_64 installation: > rpm -qa | grep libXau-dev libXau-devel-1.0.1-3.1 > rpm -qa | grep tcl-devel-8 tcl-devel-8.4.13-3.fc6 > rpm -qa | grep xorg-x11-proto xorg-x11-proto-devel-7.1-9.fc6 > rpm -qa | grep libX11-devel libX11-devel-1.0.3-8.el5 > rpm -qa | grep tk-devel tk-devel-8.4.13-3.fc6 > rpm -qa | grep libXdmcp-devel libXdmcp-devel-1.0.1-2.1 > rpm -qa | grep mesa-libGL-devel
Other Docs Other Docs
Known Problems
33
SeisSpace System Administration
mesa-libGL-devel-6.5.1-7.2.el5 > rpm -qa | grep tclx-devel tclx-devel-8.4.0-5.fc6
Set Torque Configuration Parameters We will now compile and link the server, scheduler, and mom all at the same time, then later generate specific Torque "packages" to install on all compute nodes, which run just the moms. There are many ways to install and configure Torque queues and here we are presenting just one. Torque queue setup for a single workstation is exactly the same as for the master node of a cluster, except with some changes discussed later. You should be logged into the master node as root if you are installing on a Linux cluster, or logged into your workstation as root. Here is RHEL 4.5 x86_64: > ./configure --enable-mom --enable-server --with-scp --with-serverdefault= --enable-gui --enable-docs --withtclx=/usr/lib64
Here is RHEL 5.2 x86_64: > ./configure --enable-mom --enable-server --with-scp --with-serverdefault= --enable-gui --enable-docs --withtcl=/usr/lib64 --without-tclx
Note that we pointed to /usr/lib64 for the 64-bit tclx libraries. This would be /usr/lib on 32-bit systems. With the use of "--with-scp" we are selecting ssh for file transfers between the server and moms. This means that ssh needs to be set up such that no passwords are required in both directions between the server and moms for all users.
Compile and Link the Torque Source Code We will now compile, link and install the torque binaries. > make
Install the Torque Executables and Libraries We will now install the Torque executables and libraries. > make install
Other Docs Other Docs
Known Problems
34
SeisSpace System Administration
Configure the Torque Server and Mom Instructions for installing and configuring Torque in this document treat a single workstation and the master node of a cluster the same, then discusses where the configuration of a cluster is different. Let’s go ahead and setup some two example queues for our workstation or cluster. The first thing we will do is configure our master node or single workstation for the Torque server and mom daemons. > cd /var/spool/torque/server_priv
Now let’s define which nodes our queues will be communicating with. The first thing to do is to build the /var/spool/torque/server_priv/nodes file. This file states the nodes that are to be monitored and submitted jobs to, the type of node, the number of CPUs the node has, and any special node properties. Here is an example nodes file: master np=2 ntype=cluster promax n1 np=2 ntype=cluster promax seisspace n2 np=2 ntype=cluster promax seisspace n3 np=2 ntype=cluster seisspace . . nxx np=2 ntype=cluster seisspace
The promax and seisspace entries are called properties. It is possible to assign queue properties that only submit jobs to nodes with that same property. Instead of the entries n1, n2, etc., you would enter your workstations hostname or the hostnames of your compute nodes. Now let’s initialize the pbs mom /var/spool/torque/mom_priv/config file, here is an example of what one would look like: # Log all but debug events, but 127 is good for normal logging. $logevent 127 # Set log size and deletion parameters so we don’t fill /var $log_file_max_size 1000 $log_file_roll_depth 5 # Make node unschedulable if load >4.0; continue when load drops <3.0 $ideal_load 3.0
Other Docs Other Docs
Known Problems
35
SeisSpace System Administration
$max_load 4.0 # Define server node $pbsserver # Use cp rather than scp or rcp for local (nfs) file delivery $usecp *:/export /export
The $max_load and $ideal_load parameters will have to be tuned for your system over time, and are gauged against the current entry in the /proc/loadavg file. You can also use the "uptime" command to see what the current load average of the system is. How many and what type of processes can the node handle before it is overloaded? For example, if you have a quad-core machine then a $max_load of 4 and an $ideal_load of 3.0 would be just fine. For the $pbsserver be sure to put the hostname of your Torque server. After a job is finished the stdout and stderr files are copied back to the server so they can be viewed. The $usecp entry directs for which files systems a simple "cp" command can be used rather than "scp" or "rcp". The output of the "df" command shows what should go into the $usecp entry. For example: df Filesystem
1K-blocks
Used Available Use% Mounted on
sch1:/data
480721640 327473640 148364136 69% /data
The $usecp entry would be "$usecp *:/data /data" Now let’s start the Torque server so we can load its database with our new queue configuration. > /usr/local/sbin/pbs_server -t create
- if you have an existing set of Torque queues, the "-t create" option will erase those configured. Warning
Now we need to add and configure some queues. We have documented a simple script which should help automate this process. You can type these instructions in by hand, or build a script to run. Here is what this script looks like: #!/bin/ksh /usr/local/bin/qmgr -e << "EOF" c q serial queue_type=execution c q parallel queue_type=execution
Other Docs Other Docs
Known Problems
36
SeisSpace System Administration
s q serial enabled=true, started=true, max_user_run=1 s q parallel enabled=true, started=true set server scheduling=true s s scheduler_iteration=30 s s default_queue=interactive s s managers="@*" s s node_pack=false s s query_other_jobs=true print server EOF
When creating and configuring queues you typically are doing the following: • Creating a queue and specifying its type: execution or route. • Enabling and starting the queue. • Defining any resource limitation, such as job runtime, or other properties for a queue. • Defining properties of the server, such as who can manage queues. To type these in by hand start the Torque queue manager by typing: > /usr/local/bin/qmgr
Now let’s restart the Torque server and start the Torque scheduler and mom on the master node or single workstation and test our installation. > /usr/local/bin/qterm -t quick > /usr/local/sbin/pbs_server > /usr/local/sbin/pbs_sched > /usr/local/sbin/pbs_mom
Now let’s start the Torque GUIs xpbs and xpbsmon to see the status of our queues and the Torque mom. > /usr/local/bin/xpbs &
You should see a GUI similar to the following, if you built it.
Other Docs Other Docs
Known Problems
37
SeisSpace System Administration
> /usr/local/bin/xpbsmon &
You should see a GUI similar to the following, if you built it.
Other Docs Other Docs
Known Problems
38
SeisSpace System Administration
Testing Torque Queue Submission Before integrating ProMAX with Torque it is a good idea to test the Torque setup by submitting a job (script) to Torque from the command line. Here is an example script called pbs_queue_test: #!/bin/ksh #PBS -S /bin/ksh #PBS -N pbs_queue_test #PBS -j oe #PBS -r y #PBS -o /pbs_queue_output #PBS -l nodes=1 ######### End of Job ########## hostname echo "" env echo "" cat $PBS_NODEFILE
You will need to modify the #PBS -o line of the script to direct the output to an NFS mounted filesystem which can be seen by the master node or single workstation. Submit the job to Torque as follows using a non-root user: > /usr/local/bin/qsub -q serial -m n <script path>/pbs_queue_test
If the job ran successfully, there should be a file called /pbs_queue_output containing the results of the script.
Starting Torque Server, Scheduler, and Mom to start at boot To start Torque daemons when the machines boot up, use the following scripts for the master node and single workstation: • pbs_server, pbs_sched, and pbs_mom The following /etc/init.d/pbs_server script starts pbs_server for Linux: #!/bin/sh # # pbs_server This script will start and stop the PBS Server # # chkconfig: 345 85 85 # description: PBS is a batch versitle batch system for SMPs and clusters # # Source the library functions
Other Docs Other Docs
Known Problems
39
SeisSpace System Administration
. /etc/rc.d/init.d/functions BASE_PBS_PREFIX=/usr/local ARCH=$(uname -m) AARCH="/$ARCH" if [ -d "$BASE_PBS_PREFIX$AARCH" ] then PBS_PREFIX=$BASE_PBS_PREFIX$AARCH else PBS_PREFIX=$BASE_PBS_PREFIX fi PBS_HOME=/var/spool/torque # let see how we were called case "$1" in start) echo -n "Starting PBS Server: " if [ -r $PBS_HOME/server_priv/serverdb ] then daemon $PBS_PREFIX/sbin/pbs_server else daemon $PBS_PREFIX/sbin/pbs_server -t create fi echo ;; stop) echo -n "Shutting down PBS Server: " killproc pbs_server echo ;; status) status pbs_server ;; restart) $0 stop $0 start ;; *) echo "Usage: pbs_server {start|stop|restart|status}" exit 1 esac
The following /etc/init.d/pbs_sched script starts pbs_sched for Linux:
#!/bin/sh # # pbs_sched This script will start and stop the PBS Scheduler # # chkconfig: 345 85 85 # description: PBS is a batch versitle batch system for SMPs and clusters # # Source the library functions . /etc/rc.d/init.d/functions
Other Docs Other Docs
Known Problems
40
SeisSpace System Administration
BASE_PBS_PREFIX=/usr/local ARCH=$(uname -m) AARCH="/$ARCH" if [ -d "$BASE_PBS_PREFIX$AARCH" ] then PBS_PREFIX=$BASE_PBS_PREFIX$AARCH else PBS_PREFIX=$BASE_PBS_PREFIX fi # let see how we were called case "$1" in start) echo -n "Starting PBS Scheduler: " daemon $PBS_PREFIX/sbin/pbs_sched echo ;; stop) echo -n "Shutting down PBS Scheduler: " killproc pbs_sched echo ;; status) status pbs_sched ;; restart) $0 stop $0 start ;; *) echo "Usage: pbs_sched {start|stop|restart|status}" exit 1 esac
The following /etc/init.d/pbs_mom script starts pbs_mom for Linux: #!/bin/sh # # pbs_mom This script will start and stop the PBS Mom # # chkconfig: 345 85 85 # description: PBS is a batch versitle batch system for SMPs and clusters # # Source the library functions . /etc/rc.d/init.d/functions BASE_PBS_PREFIX=/usr/local ARCH=$(uname -m) AARCH="/$ARCH" if [ -d "$BASE_PBS_PREFIX$AARCH" ] then PBS_PREFIX=$BASE_PBS_PREFIX$AARCH else
Other Docs Other Docs
Known Problems
41
SeisSpace System Administration
PBS_PREFIX=$BASE_PBS_PREFIX fi # let see how we were called case "$1" in start) if [ -r /etc/security/access.conf.BOOT ] then cp -f /etc/security/access.conf.BOOT /etc/security/access.conf fi echo -n "Starting PBS Mom: " daemon $PBS_PREFIX/sbin/pbs_mom -r echo ;; stop) echo -n "Shutting down PBS Mom: " killproc pbs_mom echo ;; status) status pbs_mom ;; restart) $0 stop $0 start ;; *) echo "Usage: pbs_mom {start|stop|restart|status}" exit 1 esac
The following commands actually setup the scripts so the O/S will start them at boot: > /sbin/chkconfig pbs_server on > /sbin/chkconfig pbs_sched on > /sbin/chkconfig pbs_mom on
Installing Torque On The Compute Nodes Now that Torque seems to be working let’s install it on the compute nodes. To perform this we need to generate some Torque self-extracting scripts called "packages". In these packages we need to also include Torque mom system startup (init.d) scripts, as well mom configuration information. Note that this step is not necesary for the single workstation. > cd /apps/torque-xx.yy.zz > mkdir pkgoverride;cd pkgoverride
Other Docs Other Docs
Known Problems
42
SeisSpace System Administration
> mkdir mom;cd mom > tar -cvpf - /var/spool/torque/mom_priv/config | tar -xvpf > tar -cvpf - /etc/rc.d/init.d/pbs_mom | tar -xvpf > cd /apps/torque-xx.yy.zz;make packages
Now that although all the packages are generated, we only need to install some of them on the compute nodes. Here is a list of all the packages: • torque-package-clients-linux-x84_64.sh • torque-package-devel-linux-x86_64.sh • torque-package-mom-linux-x86_64.sh To install these packages you need to copy them to an NFS mounted filesystem if the directory where they are stored is not visable to all compute nodes. For example: > cp *.sh
Note that if you are using cluster management software such as XCAT, Warewulf, or RocksClusters, you are better off to integrate the Torque mom files and configuration into the compute node imaging scheme. Install the packages by hand on each node, or if you have some type of cluster management software such as XCAT, use that to install onto each node. > psh compute /torque-package-clients-linuxx86_64.sh --install > psh compute /torque-package-devel-linuxx86_64.sh --install > psh compute /torque-package-mom-linuxx86_64.sh --install > psh compute /sbin/chkconfig pbs_mom on > psh compute /sbin/service pbs_mom start
The xpbsmon application should refresh shortly showing the status of the compute nodes, which should be "green" if the nodes are ready to accept scheduled jobs.
Other Docs Other Docs
Known Problems
43
SeisSpace System Administration
Connecting ProMAX and Torque ProMAX by default is set to use Torque (PBS) queues. The $PROMAX_HOME/etc/qconfig_pbs file defines which Torque queues are available for use, the name associations, the function to be called in building a job execution script, and any variables which get passed to the function script. You should modify this file to conform with the Torque queues that you have created. # # PBS batch queues # name = serial type = batch description = "Serial Execution Batch Jobs" function = pbs_submit menu = que_res_pbs.menu properties = local machine = # name = parallel type = batch description = "Parallel Execution Batch Jobs" function = pbs_submit properties = local menu = que_res_pbs.menu machine =
The following is what the SeisSpace job submit window might resemble with the configuration above:
Other Docs Other Docs
Known Problems
44
SeisSpace System Administration
If you have configured your queues for a cluster, and have confirmed that they are working properly, you need to do a couple of things to disable the master node from being used as a compute node. 1. Turn off the pbs_mom. > /sbin/service pbs_mom stop
2. Disable the pbs_mom from starting at boot. > /sbin/chkconfig pbs_mom off
3. Remove the master node from the /var/spool/torque/server_priv/nodes file.
Recommendations for Torque queues Based on our batch job queue testing efforts we offer the following guide lines for configuring your Torque batch job queues.
Other Docs Other Docs
Known Problems
45
SeisSpace System Administration
• It is important that the queue does not release too many jobs at the same time. You specify the number of available nodes and CPUs per node in the /var/spool/torque/server_priv/nodes file. Each job is submitted to the queue with a request for a number of CPU units. The default for ProMAX jobs is 1 node and 1 CPU or 1 CPU unit. That is, to release a job, there must be at least one node that has 1 CPU unallocated. • There can be instances when jobs do not quickly release from the queue although resources are available. It can take a few minutes for the jobs to release. You can change the scheduler_iteration setting the Torque qmgr command. The default is 600 seconds (or 10 minutes). We suggest a value of 30 seconds. Even with this setting, dead time for up to 2 minutes have been observed. It can take some time before the loadavg begins to fall after the machine has been loaded. • By default, Torque installs itself into the /var/spool/torque, /usr/local/bin and /usr/local/sbin directories. Always address the qmgr by its full name of /usr/local/bin/qmgr. The directory path /usr/local/bin is added to the PATH statement inside the queue management scripts by setting the PBS_BIN environment variable. If you are going to alter the PBS makefiles and have PBS installed in a location other than /usr/local, make sure you change the PBS_BIN environment setting in the ProMAX sys/exe/pbs/* files, and in the SeisSpace etc/SSclient script example. • Run the xpbs and xpbsmon programs, located generally in the /usr/local/bin directory, to monitor how jobs are being released and how the CPUs are monitored for availability. Black boxes in the xpbsmon user interface indicate that the node CPU load is greater than what has been configured, and no jobs can be spawned there until the load average drops. It is normal for nodes to show as different colored boxes in the xpbsmon display. This means that the nodes are busy and not accepting any work. You can also modify the automatic update time in the xpbsmon display. However, testing has shown that the automatic updating of the xpbs display may not be functioning. • Landmark suggests that you read the documentation for Torque. These documents include more information about the system and ways to customize the configuration, and can be found on the Torque website. • Torque requires that you have the hostnames and IP addresses in the hosts files of all the nodes. Note: hostname is the name of your machine; hostname.domainname can be found in /etc/hosts, and commonly ends with .com: ip address hostname.domain.com hostname
Other Docs Other Docs
Known Problems
46
SeisSpace System Administration
For DHCP users, ensure that all of the processing and manager nodes always get the same ip address. We present one method of installing and configuring Torque job queues. There are many alternative methods that will be successful so long as the following conditions exist: • Install Torque for all nodes of the cluster. The installation can be done on each machine independently, or you can use a common NFS mounted file system, or your cluster management software may contain a preconfigured image. • Install all components including the server and scheduler on one node. This is known as the server node and serves the other main processing nodes. Normally this will be the cluster manager node. On a single workstation the server, scheduler, and mom daemons are all installed. • The following files must be the same on all installations on all machines: /var/spool/torque/server_name /var/spool/torque/mom_priv/config These files are only used by the server and scheduler on the manager machine: /var/spool/torque/server_priv/nodes • The UID and GID for users must be consistent across the master and compute nodes. • All application, data, and home directories must be mounted the same on the master and compute nodes.
Other Docs Other Docs
Known Problems
47
SeisSpace System Administration
Flat File-based Flow Replication This section discusses how flow replication is implemented in SeiSpace. It also discusses the where and when the flat files are created and how they’re stored and managed. For more information about using the Flow Replication tools, please refer to the chapter titled Replicating Flows in the Using SeisSpace guide. In Flat File-based Flow Replication, all flow replication data are stored in flat files in the $PROMAX_DATA_HOME/AREA/LINE and $PROMAX_DATA_HOME/AREA/LINE/FLOW directories LINE/ replicaParms.txt
tab-delimited file with replica parameters editable in Excel; can be a symbolic link replicaParms.txt~ a backup that is generated whenever a repilcaParms.txt file is successfully loaded replicaPrefs.xml some constants stored about the replica table such as column width and display order
Starting with the 5000.0.1.0 release in early 2009 a file locking mechanism was added to manager the replicaParms.txt file. When a user opens the replica table and adds columns or changes values, that user will write a lock file. Other users will not be able to make edits to the replica table until the user who owns the lock file saves his/her work and releases the lock. LINE/FLOW (template flow) exec.#.pwflow --exec.#.log | These are the files associated with the template exec.#.qsh | There may be multiple versions depending on | how many times the exec.#.qerr | template was run to test it packet.#.job --jobs.stat
--- This file is not used at this time but contains the status of the main flow
exec.r#.#.pwflow --- These are the files associated with each | replica flow exec.r#.#.log | The first # after the r is the sequence number exec.r#.#.qsh --- The second # is the replica instance number. (more detail on replica instances later) replicas.#.stat
replicasInfo.xml
Other Docs Other Docs
---
a binary file that contains all of the job status information replicas of a particular version
--- a simple xml file that indicates that the flow is a template
Known Problems
48
SeisSpace System Administration
The general methodology follows the idea that it is possible that some of the replicas will need to be rerun or rebuilt and then rerun for a variety of different reasons. After you either rerun or rebuild/rerun some of the replicas you will see multiple versions of the flow, printout and qsh files for each instance of the replica and multiple replica.#.stat files. In the following example replicas 1 and 2 have been built and run 4 times, replicas 3 and 4 have been run 3 times etc... $ ls exec.r1.* exec.r1.2.log exec.r1.2.pwflow exec.r1.2.qsh exec.r1.3.log exec.r1.3.pwflow exec.r1.3.qsh $ ls exec.r2.* exec.r2.2.log exec.r2.2.pwflow exec.r2.2.qsh exec.r2.3.log exec.r2.3.pwflow exec.r2.3.qsh $ ls exec.r3.* exec.r3.1.log exec.r3.1.pwflow exec.r3.1.qsh exec.r3.2.log exec.r3.2.pwflow exec.r3.2.qsh $ ls exec.r4.* exec.r4.1.log exec.r4.1.pwflow exec.r4.1.qsh exec.r4.2.log exec.r4.2.pwflow exec.r4.2.qsh $ ls exec.r5.* exec.r5.0.log exec.r5.0.pwflow exec.r5.0.qsh exec.r5.1.log exec.r5.1.pwflow exec.r5.1.qsh $ ls exec.r6.* exec.r6.0.log exec.r6.0.pwflow exec.r6.0.qsh exec.r6.1.log exec.r6.1.pwflow exec.r6.1.qsh $ ls exec.r7.* exec.r7.0.log exec.r7.0.pwflow exec.r7.0.qsh $ ls exec.r8.* exec.r8.0.log exec.r8.0.pwflow exec.r8.0.qsh $ ls exec.r9.* exec.r9.0.log exec.r9.0.pwflow exec.r9.0.qsh $ ls exec.r10.* exec.r10.0.log exec.r10.0.pwflow exec.r10.0.qsh
Other Docs Other Docs
Known Problems
49
SeisSpace System Administration
$ ls -al *stat* jobs.stat replicas.0.stat replicas.1.stat replicas.2.stat replicas.3.stat
Notice that the earlier numbered replicas such as 1 and 2 have instance numbers 2 and 3 where as replica numbers 3 and 4 have instance numbers 1 and 2. There is a preference setting that can be used to put a limit on the number of versions of replicas to keep. In this case the preference was set to keep and automatically purge to 2 versions of the replica flows. The two most recent are retained. The job status information for all of these versions is stored in the different replicas.#.stat files. The status that is shown in the Replica Job Table (RJT) will be the status of the flow in the matching numbered stat file. The replica.3.stat file will only have information for those flows that had a 3rd instance. The stat files contain the Job status such as Complete, Failed, User Terminated. The "Built" and "Unknown" status values are not stored. A flow is marked as Built if there is no known status for it in the matching stat file and the flow files exist on disk. If multiple versions of replicated flows exist, the status that will be shown is the status in the stat file of the highest numbered replica. sequence number 1 1 1 1 1 1 2 3 4 5 6 7 8 9 0 1 2 3 4 --------------------------. . . . x . . . x . . . . . . . . x x . . . x x x . . . x x x x x x x x x x x x x x
2.stat 1.stat 0.stat
If you delete a replica using the delete function in the RJT, all existing instances of the replicated flows will be deleted and the job status will be removed from all of the stat files. For example, if the replica flows for sequence 5 are deleted, the status will be removed from all existing stat files. The status of the flow will be set to "Unknown" until the replica is rebuilt. sequence number 1 1 1 1 1 1 2 3 4 5 6 7 8 9 0 1 2 3 4 --------------------------. . . . . . . . x . . . . . . . . x . . . . x x x . . . x x x x . x x x x x x x x x
Other Docs Other Docs
2.stat 1.stat 0.stat
Known Problems
50
SeisSpace System Administration
If you delete all of the replicas, all of the replica flow folders will be deleted but the replica.#.stat files will not be deleted. All of the status values in all of the stat files will be deleted. sequence number 1 1 1 1 1 1 2 3 4 5 6 7 8 9 0 1 2 3 4 --------------------------. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.stat 1.stat 0.stat
If you make a new set of replicas the instance numbering will start at 0 again. If there are no replica flows left in the template flow, you can safely delete all of the stat files.
Other Docs Other Docs
Known Problems
51
SeisSpace System Administration
Adding User Defined Table Types Some of the larger external development facilities have added table types for use with their ProMAX tools. In ProMAX you would add these table types to the parm_headers file. To add these user defined tables to the table list in the SeisSpace user interface, these table type definitions must be added to the $PROWESS_HOME/etc/flowbuilder/TableTypes.txt file. An example is delivered with the installation. ; User-defined table types. Each line is of the form: ; ; Unique_3_Letter_Extension Description Primary Secondary Z1 ... Zn ; ; Use NULL to specify a primary or secondary key that must be queried ; Use semicolon or # to start a comment line; blank lines are ok too ; eg: ; FKP "FK Filter Polygon" NULL PLYINDEX F K ; gat "Miscellaneous Time Gate" NULL NULL START END ;FKP "FK Filter Polygon" NULL PLYINDEX F K ;eig "WAAVO Eigenvector Constraints 12 columns" CDP TIME R01 RSH1 RP1 R02 RSH2 RP2 R03 RSH3 RP3 EV1 EV2 EV3
You must restart the Navigator after you edit this file to see the changes.
Other Docs Other Docs
Known Problems
52
SeisSpace System Administration
Process Promotion and Demotion Dev/Alpha/Beta/Prod SeisSpace supports the capability of having several versions of the same module or process in different stages of development. In this case, you may want to switch between the versions in a flow without having to build and parameterize a new menu. This capability only extends to SeisSpace module development. There is no application for ProMAX tools. An example scenario may be that you have a program that you are working on in your development environment and periodically you want to release a version to production, but you want the development version to be available as well so that a tester can test a new development version against the current production version easily. In this case a user can insert a process into a flow by choosing the process from either the production processes list or the development processes list from the developers tree. Then the user can switch back and forth and have the menus update with like parameters and execute the different versions of the program. The examples below will use several typical scenarios to illustrate using the process promotion/demotion capability/ Note: These examples assume that your SeisSpace development environment has been configured using the PROWESS_HOME/port/bin/Makeseisspace script.
First example - Simple single developer environment with two versions of a module A simple example for an external development site might look something like this: There are two "systems" that the users need access to simultaneously. • The customer’s standard Landmark-provided installation in a common shared directory. • The customer’s developer’s development system in the developer’s home directory. The standard Landmark system has no knowledge of the customer’s tool. The developer has two versions, a "production" version and a "dev" version.
Other Docs Other Docs
Known Problems
53
SeisSpace System Administration
The user will need to add a MY_PROWESS_HOME environment variable to the SSclient start up script set of variables, where MY_PROWESS_HOME is set to the developers home prowess development directory. In this case the development user is user1 and MY_PROWESS_HOME would be set to /home/user1/prowess You can look at the Example0AddAmplitude example program to see how you might want to set this up. This will be the first example. The developer would have the following directory structure in his/her home directory: [ssuser@nuthatch example0addamplitude]$ pwd /home/ssuser/prowess/port/com/djg/prowess/tool/example0addamplitude [ssuser@nuthatch example0addamplitude]$ ls -alR * Example0AddAmplitudeProc.java Example0AddAmplitudeTool.java Makefile dev: Example0AddAmplitudeProc.java Example0AddAmplitudeTool.java Makefile
There are two different versions of the menu and tool code; one version in the main directory and another version in the "dev" subdirectory. There are also two different makefiles. The one major change in the menu between the two versions is the package name. The production version of the *Proc.java file would have the first line: package com.djg.prowess.tool.example0addamplitude;
The dev version of the *Proc.java file would have the first line: package com.djg.prowess.tool.example0addamplitude.dev;
Other Docs Other Docs
Known Problems
54
SeisSpace System Administration
Note: You want to make sure that getToolName is commented out. // public String getToolName() { // return "com.djg.prowess.tool.example0addamplitude.Example0AddAmplitudeTool"; // }
In the production version of *Tool.java file you would have the first lines: // Each tool is in a java package with its proc (menu) file package com.djg.prowess.tool.example0addamplitude;
and in the dev version of the *Tool.java file you would have the first line: // Each tool is in a java package with its proc (menu) file package com.djg.prowess.tool.example0addamplitude.dev;
The production verion of the Makefile file would have the PACKAGE line: PACKAGE := com/djg/prowess/tool/example0addamplitude
The dev verion of the Makefile file would have the PACKAGE line: PACKAGE := com/djg/prowess/tool/example0addamplitude/dev
OPTION 1- make the two versions of the module available in the developers Processes List: Edit the PROWESS.xml file in the developers home/prowess/etc diretory: [ssuser@nuthatch flowbuilder]$ pwd /home/ssuser/prowess/etc/flowbuilder [ssuser@nuthatch flowbuilder]$ more PROWESS.xml com.djg.prowess.tool.example0addamplitude.Example0AddAmplitudeProc com.djg.prowess.tool.example0addamplitude.dev.Example0AddAmplitudeProc
Note: The "|dev" designation and the addition of "dev" to the PROC file path name for the development version of the tool.
Other Docs Other Docs
Known Problems
55
SeisSpace System Administration
In this case there are two versions of the process in the processes list. The user would be required to specifically choose both of the processes and have both menus in the flow and swap back and forth between them. OPTION 2 - For the case where you want to swap between the menus but only have one occurrence of the process in the flow the developer would add a *Proc.xml file the his/her prowess/etc directory (the same directory where the developers Processes list is.) This file will have a copy of this individual tool stanza from the PROWESS.xml processes list in it [ssuser@nuthatch flowbuilder]$ pwd /home/ssuser/prowess/etc/flowbuilder [ssuser@nuthatch flowbuilder]$ ls -al Example0AddAmplitudeProc.xml PROWESS.xml [ssuser@nuthatch flowbuilder]$ more Example0AddAmplitudeProc.xml com.djg.prowess.tool.example0addamplitude.Example0AddAmplitudeProc com.ano.prowess.tool.example0addamplitude.dev.Example0AddAmplitudeProc
Now with one process in the menu, the user can swap between the production and development versions using the MB3>Versions menu on that process. Note: This may be Ctrl-MB3>Versions if using the default mouse bindings where MB3 toggles processes active to inactive. The color code of the process with change and the icon designation will also change showing the version of the tool that was selected. Four versions of each module are supported: dev, alpha, beta, and the default un-designated production version. A fifth "obsolete" designation can be used to flag a process or a version to be obsolete.
Other Docs Other Docs
Known Problems
56
SeisSpace System Administration
Second example - Multiple developer environment with two versions of a module This is an extension of the first example where there are more than one developer, but each developer is working in the same mode of having a couple different versions of a module available: There are three "systems" that the users need access to simultaneously. • The standard LGC provided installation in a common shared directory. • The first customer’s developer’s development system in the first developer’s home directory. • The second customer’s developer’s development system in the second developer’s home directory. The standard Landmark system has no knowledge of the customer’s tools. The developer’s tools have two versions, a "production" version and a "dev" version. A user within the customer’s system will need to add a MY_PROWESS_HOME environment variable to the SSclient start up script set of variables, where MY_PROWESS_HOME is set to the both developers home prowess development directories in a ":" separated list. In this case, the development users are user1 and user2 and MY_PROWESS_HOME would be set to /home/user1/prowess:/home/user2/prowess. In this model if the two developers have the same module name and each have a *PROC.xml file in their development trees, these files will be concatenated and all versions in both files will be shown in the MB3versions menu.
Third example - Multiple developer environment with two versions of a module and a customer "production environment" There are at least three "systems" that the users need access to simultaneously. • The standard LGC provided installation in a common shared directory • The customer’s primary production system in a common shared directory
Other Docs Other Docs
Known Problems
57
SeisSpace System Administration
• The customer’s developer’s development system in the developer’s home directory (possibly more than 1). The standard LGC system has no knowledge of the customers tool. The customers tool has two versions, a "production" version and a developers dev version. There is very little difference between this scenario and the previous one. The only difference is that the production version are in a "public" place and will probably have a package definition related to the company name. The user will just need to add the appropriate paths to the MY_PROWESS_HOME variable to see the public locations. In this model if the two developers have the same module name and each have a *PROC.xml file in their development trees and there is a third file of the same name in the customer "production" tree, these files will be concatenated and all versions in both files will be shown in the MB3>Versions menu.
Summary Each developer can support up to 4 versions of a module: dev, alpha, beta and a production version (the production version has no special designation). Processes in the Processes list can be listed in an obsolete status by using the "obsolete" designation. The example that is copied in from the Makeseisspace script can be used as an example for setting the package names and the makefile paths for each version. If multiple versions are put into the PROWESS.xml file which stores the processes list in the ../etc/flowbuilder directory, then the user can directly select from the different versions of the module from the processes list. If multiple versions are put into the toolProc.xml file in the ../etc/flowbuilder directory, then the user can swap between processes using the MB3>Versions menu. Note: This may be Ctrl-MB3> Versions if using the default mouse bindings where MB3 toggles processes active to inactive. The users need to add the MY_PROWESS_HOME environment variable to the SSclient script where this can be a single or colon (:) separated multiple path. Each directory in the path will show up as a separate list of processes in the processes list panel of the Flow Builder.
Other Docs Other Docs
Known Problems
58
SeisSpace System Administration
The user can choose the process from any of the processes lists and if there is a toolProc.xml file in one or all of the etc/flowbuilder directories, then the user can choose the version to use from the Ctrl-MB3>Versions options. If there are multiple toolPROC.xml files in all of the directories, these will be concatenated (on a per tool basis) and all options in all the files will be presented in the version options. It is important to keep the hierarchy of the directories in mind when working with multiple versions of processes. If there are multiple directories listed in MY_PROWESS_HOME, the first instance wins. Therefore, it is desirable that multiple developers not use the same tool names. In addition, all toolProc.xml files in all etc/flowbuilder directory for all directories in the multi-pathed MY_PROWESS_HOME will be concatenated.
Managing multiple help files for the different versions. This is a copy of the self documented tooProc.xml file PROWESS_HOME/etc/flowbuilder/Example0AddAmplitudeProc.xml examples.example0addamplitude.dev.Example0AddAmplitudeProc /lair/gwong/add_amplitude_dev.pdf examples.example0addamplitude.alpha.Example0AddAmplitudeProc add_amplitude_alpha examples.example0addamplitude.beta.Example0AddAmplitudeProc add_amplitude_beta examples.example0addamplitude.Example0AddAmplitudeProc
Other Docs Other Docs
Known Problems
59
SeisSpace System Administration
Other Docs Other Docs
Known Problems