*Purpose Enter a clear description of what the document is trying to achieve. The purpose of of this document is to provide guidance on ubleshooting ACFS/ADVM related issues. Troubleshooting Steps ok s.log. is a one character product Id (F for ACFS and V f or ADVM). Each time the kernel drivers are loaded, the serial number
what to collect when tro
1. What's New in 11.2.0.3 a) Starting on 11.2.0.3.4 release & onwards, diagcollection.pl script now automa tically collects most ACFS information (please check the Appendix B below for addi tional information). b) Starting on 11.2.0.3.0 release & onwards, OKS logs are now automatically writ ten to disk. This functionality is in addition to the existing in-memory OKS log . The in-memory OKS log features and functionality have not been altered in any way (please check the Appendix A below for additional information). 2. Overview The ACFS/ADVM product consists of several processes in the ASM instance, three s eparate kernel device drivers and several command and utility programs. The thre e kernel drivers that make up the ACFS products are: - Oracle ACFS ASM Cluster File system - Oracle ADVM ASM Cluster Dynamic Volume Manager - Oracle OKS Oracle Kernel Services When a support issue arises, it is important to collect all of the pertinent dat a related to ACFS/ADVM. Diagnostic information related to ACFS/ADVM is located in several areas includin g: System configuration (e.g. OS version, platform type, etc.) ASM alert log and process trace files
CSS trace files Operating system log/event files CRS resource status ACFS/ADVM specific log files Operating system crash dumps ACFS Replication trace and log files 3. ASM Instance Alert Log and Process Trace Files Diagnostic information related to the ASM instance consists of the alert log alo ng with the process log trace files for the ADVM processes. ASM process trace fi les are typically named as
- Solaris /var/adm/messages (should also include the /var/adm/messages.
On Solaris, the memory log can be dumped to a file via: acfsutil log [-f filename] The acfsutil log command creates a file in the current directory. The default file name is oks.log, and can be overridden via the f option. Files written by kernel drivers will be created in the /var/log directory an d a message will be written to the /var/adm/messages file. If the acfsutil log is unresponsive, then the memory log can be dumped to a file (as root) via: echo "ks_log_buf/K | ::print KS_CHUNK_RING_BUFS ks_bufs | /K | /s" | sudo md b -k >oks.log AIX On AIX, the memory log can be dumped to a file via: acfsutil log [-f filename] The acfsutil log command creates a file in the current directory. The default file name is oks.log, and can be overridden via the -f option. Files written by kernel drivers will be created in the /var/log directory an d a message will be written to the /var/adm/messages file. 6. Panics / Crash Dumps / Hangs Some issues relating to ACFS/ADVM may result in system panics. In order to diagn ose these rare occurrences, the system must be configured to save crash dumps. T he following steps are needed to enable crash dumps for a system: Linux (OEL) - Install the debuginfo rpms for your kernel (optional) - Add the following lines to /etc/kdump.conf ext3 LABEL=/ (assumes / is the root disk) core_collector makedumpfile c d31 (only if using debuginfo kernels) path /var/crash - Append crashkernel=128M@16M to the end of the kernel lines in /boot/grub.con f (The node needs to be rebooted after this change) - Enable and start the kdump service sudo /sbin/chkconfig kdump on sudo /sbin/service kdump restart Windows Windows systems should be configured for kernel memory dumps. From Control Pa nel run the System utility to bring up the System Properties window. Select the Advanc ed tab and them the Settings button. In the Write debugging information area, select K rnel Memory Dump and check the Overwrite any existing file check box. Solaris Solaris is normally configured to capture crash dumps when starting up after a system crash. They normally appear in /var/crash/
# sysdumpdev primary secondary copy directory forced copy flag always allow dump dump compression type of dump
/dev/lg_dumplv /dev/sysdumpnull /var/adm/ras TRUE FALSE ON traditional
# lsvg -l rootvg | egrep 'TYPE|dump|paging' LV NAME TYPE LPs PPs hd6 paging 31 31 lg_dumplv sysdump 16 16
Pvs 1 1
...
If the sysdump partition vg_dumplv exists, it may not be big enough. While it s hould be possible to increase its size, it may be easier to use a paging volume like hd6, above. To use partition hd6, consider the following command to use understand the man p age first, then check with the customer and/or AIX support: sysdumpdev -p /dev/hd6 -P ACFS may crash on its own. If it doesn't, you can force a crash and dump with t he acfsutil panic command discussed below or with AIX's: sysdumpstart -p If crash dumps aren't saved after the system restarts, use: savecore /var/adm/ras to save the dump as file /var/adm/ras/vmcore.XX.BZ and save the boot image as fi le /var/adm/ras/vmunix.XX, where the XX suffix is a numeric identifier that is i ncreased every time a crash dump is saved. It's also helpful to copy kdb and the ACFS drivers: cp /usr/lib/drivers/oracle*.ext /usr/sbin/kdb /var/adm/ras Hangs If there is a system or process hang, you may force a crash dump via the followi ng undocumented command as root (Linux/UNIX) or Administrator (Windows): acfsutil panic acfsutil panic
(panic the node) [/c | -c] (panic all the nodes in the cluster)
On Linux, if you cannot tolerate crashing a node (or multiple nodes), you ca n try the following to write stack traces to the system log: On each node, either type ALT-SysRq-t on the physical keyboard, or "echo t > /proc/sysrq-trigger" from a shell prompt
On Solaris, if you cannot tolerate crashing a node (or multiple nodes), you can try the following to write stack traces to the system log:
On each node type echo ::threadlist -v | mdb -k 7. ACFS Replication Diagnostic Information In addition to all the pertinent data collected for ACFS/ADVM, the following will also need to be collected for ACFS Replication. Output from the following commands: acfsutil repl info -c /
- Output from crsctl stat res init - Output from crsctl stat res p - Output from crsctl stat res p init OKS Logs - All OKS logs generated by the system - Output from acfsutil log ASM trace files Grid Infrastructure log/trace files collected by $GI_HOME/bin/diagcollection.sh Crash Dump if available ACFS Replication Diagnostic Information from Section 6 For cluster problems, we need all of the above for each node. Appendix A - Persistent OKS logs (11.2.0.3 & onwards) OKS logs are now automatically written to disk. This functionality is in additio n to the existing in-memory OKS log. The in-memory OKS log features and function ality have not been altered in any way. OKS persistent log features 1. Automatically collects up to 10 log files, each with a maximum size of 100MB for a total of 1GB (defaults). 2. If more data is collected, the oldest log file is over-written. 3. The OKS persistent log hijacks the previously existing OKS console logging. 3.1. It does this by reusing the -n flag of the acfsutil log command. 3.2. Instead of (previously) sending the -n data to the console (and /var/log/me ssages (on Linux)), it sends the data to the persistent log instead. 3.3. You can continue to set separate log levels for the in-memory log and the p ersistent log. 3.4. An acfsutil log example is acfsutil log -p avd -l 4 -n 3 -c 0xfff where -l is the in-memory log level and -n is the persistent log level. 4. All persistent log parameters are tunable. 5. A new acfsutil command, plogconfig allows you to do this. 6. OKS persistent logging is started by the standard utilities (acfsload, usm_lo ad_from_view.sh, etc) after the drivers are loaded. 7. The persistent OKS logs are stored in the grid home at:
.2 to 11.2 Information in this document applies to any platform. Symptoms 1) ACFS filesystem cannot be mounted on one RAC node due to the ACFS-02017 error : [root@rzdb-rac-02-01 ~]# /bin/mount -t acfs /dev/asm/acf_002003-494 /u01/app/ora cle/acfsmounts mount.acfs: CLSU-00100: Operating System function: open64 failed with error data : 2 mount.acfs: CLSU-00101: Operating System error message: No such file or director y mount.acfs: CLSU-00103: error location: OOF_1 mount.acfs: CLSU-00104: additional error information: open64 (/dev/asm/acf_00200 3-494) mount.acfs: ACFS-02017: Failed to open volume /dev/asm/acf_002003-494. Verify th e volume exists. 2) This is a 4 node RAC-ACFS configuration. Cause 1) The associated ADVM volume (e.g. /dev/asm/acf_002003-494) is not present (at OS level) on node #1, but it is present on the other 3 RAC nodes: Node #1: [root@rzdb-rac-02-01 ~]# ls -l /dev/asm/acf_002003-494 ls: /dev/asm/acf_002003-494: No such file or directory <(==== *** No Present**** [root@rzdb-rac-02-01 ~]# ls -ld /u01/app/oracle/acfsmounts drwxr-xr-x 2 root root 4096 Sep 21 12:12 /u01/app/oracle/acfsmounts [oracle@rzdb-rac-02-01 ~]$ asmcmd ASMCMD [+] > volinfo -a Diskgroup Name: DAT_002003 Volume Name: ACF_002003 Volume Device: /dev/asm/acf_002003-494 State: DISABLED Size (MB): 11612160 Resize Unit (MB): 256 Redundancy: UNPROT Stripe Columns: 4 Stripe Width (K): 128 Usage: ACFS Mountpath: /u01/app/oracle/acfsmounts Node #2: [root@rzdb-rac-02-02 ~]# ls -l /dev/asm/acf_002003-494 brwxrwx--- 1 root oinstall 252, 252929 Jan 12 14:06 /dev/asm/acf_002003-494 (==== *** Present**** [root@rzdb-rac-02-02 ~]# ls -ld /u01/app/oracle/acfsmounts drwxrwx--- 6 root osasm 4096 Feb 14 13:30 /u01/app/oracle/acfsmounts [oracle@rzdb-rac-02-02 ~]$ asmcmd ASMCMD [+] > volinfo -a Diskgroup Name: DAT_002003 Volume Name: ACF_002003
<
Volume Device: /dev/asm/acf_002003-494 State: ENABLED Size (MB): 11612160 Resize Unit (MB): 256 Redundancy: UNPROT Stripe Columns: 4 Stripe Width (K): 128 Usage: ACFS Mountpath: /u01/app/oracle/acfsmounts Node #3: [root@rzdb-rac-02-03 ~]# ls -l /dev/asm/acf_002003-494 brwxrwx--- 1 root oinstall 252, 252929 Jan 12 14:06 /dev/asm/acf_002003-494 <(== == *** Present**** [root@rzdb-rac-02-03 ~]# ls -ld /u01/app/oracle/acfsmounts drwxrwx--- 6 root osasm 4096 Feb 14 13:30 /u01/app/oracle/acfsmounts [oracle@rzdb-rac-02-03 ~]$ asmcmd ASMCMD [+] > volinfo -a Diskgroup Name: DAT_002003 Volume Name: ACF_002003 Volume Device: /dev/asm/acf_002003-494 State: ENABLED Size (MB): 11612160 Resize Unit (MB): 256 Redundancy: UNPROT Stripe Columns: 4 Stripe Width (K): 128 Usage: ACFS Mountpath: /u01/app/oracle/acfsmounts Node #4: [root@rzdb-rac-02-04 ~]# ls -l /dev/asm/acf_002003-494 brwxrwx--- 1 root oinstall 252, 252929 Jan 12 14:06 /dev/asm/acf_002003-494 <(== == *** Present**** [root@rzdb-rac-02-04 ~]# ls -ld /u01/app/oracle/acfsmounts drwxrwx--- 6 root osasm 4096 Feb 14 13:30 /u01/app/oracle/acfsmounts [oracle@rzdb-rac-02-04 ~]$ asmcmd ASMCMD [+] > volinfo -a Diskgroup Name: DAT_002003 Volume Name: ACF_002003 Volume Device: /dev/asm/acf_002003-494 State: ENABLED Size (MB): 11612160 Resize Unit (MB): 256 Redundancy: UNPROT Stripe Columns: 4 Stripe Width (K): 128 Usage: ACFS Mountpath: /u01/app/oracle/acfsmounts 2) This occurred due to the associated volume ( /dev/asm/acf_002003-494)was disable d in the affected node #1: [oracle@rzdb-rac-02-01 ~]$ asmcmd ASMCMD [+] > volinfo -a Diskgroup Name: DAT_002003
Volume Name: ACF_002003 Volume Device: /dev/asm/acf_002003-494 State: DISABLED <(==== ******* Size (MB): 11612160 Resize Unit (MB): 256 Redundancy: UNPROT Stripe Columns: 4 Stripe Width (K): 128 Usage: ACFS Mountpath: /u01/app/oracle/acfsmounts Solution 1) Please enable the associated volume on the affected node thru ASMCMD as follo w: ASMCMD> volenable -a 2) Then check the new state: ASMCMD> volinfo -a 3) Finally mount the ACFS filesystem on the associated/affected Node #1 (Manuall y (using the mount OS command) or during the CRS stack startup). Example (AIX example): # /usr/sbin/mount -v acfs /dev/asm/acf_002003-494 /u01/app/oracle/acfsmounts # df -k Note: For additional information (about the mount options for Solaris, Linux & AIX) please check the next manuals: http://www.oracle.com/pls/db112/homepage Oracle ACFS Command-line Tools for Linux and UNIX Environments Oracle ACFS Command-line Tools for the Solaris Environment Oracle ACFS Command-line Tools for the AIX Environment ################################################################################ ########################################## Check the cluster running state [root@multdb01tst-sm public]# crsctl stat res -t -init ################################################################################ ################################################################# :::::::::::::::::::::::::::::::::::: Connect interactive window of asm config :: :::::::::::::::::::::::::::::::::::::. Oracle instance(s) RUNNING on this node are: +ASM1 Oracle instance(s) CONFIGURED in oratab are: +ASM1 (home: /app/grid/11.2.0.3) cchtst (home: /app/oracle/product/11.2.0.3_cchtst) Enter Oracle SID you require : ORACLE_SID = [+ASM1] ? +-------------------------------------------------------------------------+ | This is your current Oracle environment | +-------------------------------------------------------------------------+ ORACLE_SID ........ = +ASM1 ORACLE_BASE ....... = /app/oracle ORACLE_HOME ....... = /app/grid/11.2.0.3 LD_LIBRARY_PATH ... = /app/grid/11.2.0.3/lib TNS_ADMIN ......... = /app/grid/11.2.0.3/network/admin
CRS_HOME .......... = /app/oracle/product/crs +-------------------------------------------------------------------------+ [oracle@multdb01tst-sm ~]$ OH [oracle@multdb01tst-sm 11.2.0.3]$ cd bin [oracle@multdb01tst-sm bin]$ ./asmca