Oracle RAC instances are composed of following background processes: ACMS(11g)
— Atomic Control le to Memor! Ser"ice (ACMS)
#$%&' (11g) — #lobal $ransaction rocess *MO+
— #lobal ,n-ueue Ser"ice Monitor
*M.
— #lobal ,n-ueue Ser"ice .aemon
*MS
— #lobal Cac/e Ser"ice rocess
*C0&
— nstance ,n-ueue rocess rocess
.A#
— .iagnosabilit! .aemon
RMSn
— Oracle RAC RAC Management rocesses (RMSn)
RSM+
— Remote Sla"e Monitor
.2RM +#
— .atabase Resource Resource Manager (from 11g R3) — Response $ime Agent (from 11g R3)
Oracle Real Application Clusters New features
Oracle 9i 9i RAC RAC •
OS (Oracle arallel Ser"er) was renamed as RA RAC C
•
C4S (Cluster 4ile S!stem) was supported
•
OC4S (Oracle Cluster 4ile S!stem) for *inu5 and 6indows
•
watc/dog timer replaced b! /angc/eck timer
Oracle 10g R1 RAC R1 RAC •
Cluster Manager replaced b! CRS
•
ASM introduced ASM introduced
•
Concept of Ser"ices e5panded
•
ocrc/eck introduced ocrc/eck introduced
•
ocrdump introduced ocrdump introduced
•
A6R was instance specic
Oracle 10g R2 RAC R2 RAC •
CRS was renamed as Clusterware
•
asmcmd introduced
•
C*7849 introduced
•
OCR and 8oting disks can be mirrored
•
Can use 4A+4C4 wit/ $A4 for OC and O.;+,$
Oracle 11g R1 RAC
1; Oracle 11g RAC parallel upgrades ' Oracle 11g /a"e rolling upgrade features w/ereb! RAC database can be upgraded wit/out an! downtime; 3;
; Oracle RAC load balancing ad"isor ' Starting from 1&g R3 we /a"e RAC load balancing ad"isor utilit!; 11g RAC load balancing ad"isor is onl! a"ailable wit/ clients w/o use ;+,$? O.2C? or t/e Oracle Call nterface (OC); @; A..M for RAC ' Oracle /as incorporated RAC into t/e automatic database diagnostic monitor? for cross'node ad"isories; $/e script addmrpt;s-l run gi"e report for single instance? will not report all instances in RAC? t/is is known as instance A..M; 2ut using t/e new package .2MSA..M? we can generate report for all instances of RAC? t/is known as database A..M; B; Optimied RAC cac/e fusion protocols ' mo"es on from t/e general cac/e fusion protocols in 1&g to deal wit/ specic scenarios w/ere t/e protocols could be furt/er optimied; D; Oracle 11g RAC #rid pro"isioning ' $/e Oracle grid control pro"isioning pack allows us to Eblow'outE a RAC node wit/out t/e time'consuming install? using a pre'installed EfootprintE;
Oracle 11g R2 RAC
1; 6e can store e"er!t/ing on t/e ASM; 6e can store OCR F "oting les also on t/e ASM; 3; ASMCA >; Single Client Access +ame (SCA+) ' eliminates t/e need to c/ange tns entr! w/en nodes are added to or remo"ed from t/e Cluster; RAC instances register to SCA+ listeners as remote listeners; SCA+ is full! -ualied name; Oracle recommends assigning > addresses to SCA+? w/ic/ create t/ree SCA+ listeners;
@; A6R is consolidated for t/e database; B; 11g Release 3 Real Application Cluster (RAC) /as ser"er pooling tec/nologies so itGs easier to pro"ision and manage database grids; $/is update is geared toward d!namicall! adusting ser"ers as corporations manage t/e ebb and How between data re-uirements for dataware/ousing and applications; D; 2! default? *OA.2A*A+C, is O+; I; #S. (#lobal Ser"ice .eamon)? gsdctl introduced; J; #n prole; K; Oracle RAC One+ode is a new option t/at makes it easier to consolidate databases t/at arenGt mission critical? but need redundanc!; 1&;raconeinit ' to con"ert database to RacOne+ode; 11;racone5 ' to 5 RacOne+ode database in case of failure; 13;racone3rac ' to con"ert RacOne+ode back to RAC; 1>;Oracle Restart ' t/e feature of Oracle #rid nfrastructureLs
OCR (but local) and is managed b! O
SPLIT BRAIN CONITION AN IO !"NCIN# $"C%ANIS$ IN ORACL" CL&ST"R'AR" Oracle clusterware pro"ides t/e mec/anisms to monitor t/e cluster operation and detect some potential issues wit/ t/e cluster; One of particular scenarios t/at needs to be pre"ented is called split brain condition; A split brain condition occurs w/en a single cluster n ode /as a failure t/at results in reconguration of cluster into multiple partitions wit/ eac/ partition forming its own sub'cluster wit/out t/e knowledge of t/e e5istence of ot/er; $/is would lead to collision and corruption of s/ared data as eac/ sub'cluster assumes owners/ip of s/ared data 1P; 4or a cluster databases like Oracle RAC database? data corruption is a serious issue t/at /as to be pre"ented all t/e time; Oracle clustereware solution to t/e split brain condition is to pro"ide O fencing: if a cluster node fails? Oracle clusterware ensures t/e failed node to be fenced oN from all t/e O operations on t/e s/ared storage; One of t/e O fencing met/od is called S$OM$< w/ic/ stands for S/oot t/e Ot/er Mac/ine in t/e
n t/is met/od? once detecting a potential split brain condition? Oracle clusterware automaticall! picks a cluster node as a "ictim to reboot to a"oid data corruption; $/is process is called node e"iction; .2As or s!stem administrators need to understand /ow t/is O fencing mec/anism works and learn /ow to troubles/oot t/e clustereware problem; 6/en t/e! e5perience a cluster node reboot e"ent? .2As or s!stem administrators need to be able to anal!e t/e e"ents and identif! t/e root cause of t/e clusterware failure; Oracle clusterware uses two Cluster S!nc/roniation Ser"ice (CSS) /eartbeats: 1; network /eartbeat (+<2) and 3; disk /eartbeat (.<2) and two CSS misscount "alues associated wit/ t/ese /eartbeats to detect t/e potential split brain conditions; $/e network /eartbeat crosses t/e pri"ate interconnect to establis/ and conrm "alid node members/ip in t/e cluster; $/e disk /eartbeat is between t/e cluster node and t/e "oting disk on t/e s/ared storage; 2ot/ /eartbeats /a"e t/eir own ma5imal misscount "alues in seconds called CSS misscount in w/ic/ t/e /eartbeats must be completedQ ot/erwise a node e"iction will be triggered; $/e CSS (isscount for t)e networ* )eart+eat /as t/e following default "alues depending on t/e "ersion of Oracle clusterweare and operating s!stems:
OS *inu5 7ni5 8MS 6indo ws
10g ,R1 -R2 11 . g D& >& >& >& >& >&
>&
>&
$/e CSS (isscount for /is* )eart+eat also "aries on t/e "ersions of Oracle clustereware; 4or oracle 1&;3;1 and up? t/e default "alue is 3&& seconds; NO" "ICTION IA#NOSIS CAS" ST& 6/en a node e"iction occurs? Oracle clusterware usuall! records error messages into "arious log les; $/ese logs les pro"ide t/e e"idences and t/e start points for .2As and s!stem administrators to do troubles/ooting ; $/e following case stud! illustrates a troubles/ooting process based on a node e"iction w/ic/ occurred in a 11'node 1&g RAC production database; $/e s!mptom was t/at node I of t/at cluster got automaticall! rebooted around 11:1Bam; $/e troubles/ooting started wit/ e5amining s!slog le "arQlogmessages and found t/e following error message: Jul 23 11:15:23 racdb7 logger: Oracle clsomon failed with fatal status 12. Jul 23 11:15:23 racdb7 logger: Oracle CSSD failure 13. Jul 23 11:15:23 racdb7 logger: Oracle C!S failure. !ebooting for cluster integrit".
$/en e5amined t/e OCSS. logle at $CRS_HOME/log//cssd/ocssd.log le and found t/e following error messages w/ic/ s/owed t/at node I network /eartbeat didnGt complete wit/in t/e D& seconds CSS misscount and triggered a node e"iction e"ent: # CSSD$2%%&'%7'23 11:1:(.15% #11(()1&%%$ *+,!--/: clssnm0ollinghread: node racdb7 7 at 5%4 heartbeat fatal e6iction in 2(.72% seconds .. clssnm0ollinghread: node racdb7 7 at (%4 heartbeat fatal e6iction in %.55% seconds # CSSD$2%%&'%7'23 11:15:1(.%7( #122%5(&112$ *!,C8: clssnmDoS"nc9date: erminating node 7 racdb7 misstime)%2%% state3
CRS R"BOOTS TRO&BL"S%OOTIN# PROC"&R" 2esides of t/e node e"iction caused b! t/e failure of network /eartbeat or disk /eartbeat? ot/er e"ents ma! also cause CRS node reboot; Oracle clusterware pro"ides se"eral processes to monitor t/e operations of t/e clusterware; 6/en certain conditions occurs? to protect t/e data integrit!? t/ese monitoring process ma! automaticall! kill t/e clusterware? e"en reboot t/e node and lea"e some critical error messages in t/eir log les $/e following lists roles of t/ese clusterware processes in t/e ser"er reboot and w/ere t/eir logs are located:
$/ree of clusterware processes OCSS.? OROC. and OC*SOMO+ can initiate a CRS reboot w/en t/e! run into certain errors: 1; OCSS , CSS /ae(on. monitors inter'node /eat/? suc/ as t/e interconnect and members/ip of t/e cluster nodes; ts log le is located in CRS
3; OPROC,Oracle Process $onitor ae(on. ? introduced in 1&;3;&;@? detects /ardware and dri"er freees t/at results in t/e node e"iction? t/en kills t/e node to pre"ent an! O from accessing t/e s/aring disk; ts log le is etcoracleoprocd/ostnameT; oprocd;log >; OCLSO$ON process monitors t/e CSS daemon for /angs or sc/eduling issue; t ma! reboot t/e node if it sees a potential /ang; $/e log le is CRS