Understanding Instance Recovery in RAC – Understanding Cache Fusion in RAC
• •
Crash Recovery - all instances have failed Instance Recovery - one instance has failed
In both cases the threads from failed instances need to be merged, in a instance recovery SMON will perform the recovery where as in a crash recovery a foreground process performs the recovery !he main features "advantages# of cache fusion recovery are
•
Recovery cost is proportional to the number of failures, not the total number of nodes It eliminates dis$ reads of bloc$s that are present in a surviving i nstance%s cache
•
It prunes recovery set based on the global resource loc$ state
•
!he cluster is available after an initial log scan, even before recovery reads are complete
•
In cache fusion the starting point for recovery of a bloc$ is its most current &I version, this could be located on any of the surviving instances and multiple &I bloc$s of a particular buffer can e'ist Remastering is the term used that describes the operation whereby a node attempting recovery tries to own or master the resource"s# that were once mastered by another instance prior to the failure (hen one instance leaves the cluster, the )R* of that instance needs to be redistributed to the surviving nodes R+C uses an algorithm called lazy remastering to to remaster only a minimal number of resources during a reconfiguration !he entire &arallel Cache Management "&CM# loc$ space remains invalid while the *M and SMON complete the below steps
I*M master node node discards loc$s that are held by dead instances, the space is reclaimed by this operation is used to remaster loc$s that are held by the surviving instance for which a dead instance was remastered . SMON issues a message saying that it has ac/uired ac/uired the necessary buffer loc$s to perform recovery recovery 1
ets loo$ at an e'ample on what happens during a remastering, lets presume the following
•
Instance + masters masters resources , 0, 1 and 2 Instance 3 masters resources ., 4, 5, and 6
•
Instance C masters resources 7, 8, and .
•
Instance 3 is removed from the cluster, only the resources from instance 3 are evenly remastered across the surviving nodes "no resources on instances + and C are affected#, this reduces the amount of wor$ the R+C has to perform, li$ewise when a instance 9oins a cluster only minimum amount of resources are remastered to the new instance instance Before Remastering
After Remastering
:ou :ou can control the remastering process with a number of parameters _gcs_fast_config _gcs_fast_con fig
enables fast reconfiguration for gcs loc$s "true;false#
_lm_master_weight _lm_master_wei ght
controls which instance will hold or "re#master more resources than others
_gcs_resources
controls the number of resources an instance will master at a time
you can also force a dynamic remastering "*RM# of an ob9ect using oradebug 2
<< Obtain the O3=>C!?I* form the below table S@A select B from vgcspfmaster?infoD force dynamic remastering "*RM#
<< *etermine who masters it S@A oradebug setmypid S@A oradebug l$debug -a EO3=>C!?I*A << Now remaster the resource S@A oradebug setmypid S@A oradebug l$debug -m p$ey EO3=>C!?I*A
!he steps of a )R* reconfiguration is as follows
•
Instance death is detected by the c luster manager Re/uest for &CM loc$s are froFen
•
>n/ueues are reconfigured and made available
•
*M recovery
•
)CS "&CM loc$# is remastered
•
&ending writes and notifications are processed
•
I &ass recovery
•
•
o
!he instance recovery "IR# loc$ is ac/uired by SMON
o
!he recovery set is prepared and built, memory space is allocated in the SMON &)+
o
SMON ac/uires loc$s on buffers that need recovery
II &ass recovery
3
o
II pass recovery is initiated, database is partially available
o
3loc$s are made available as they are recovered
o
!he IR loc$ is released by SMON, recovery is then complete
o
!he system is available
)raphically it loo$s li$e below
4
5
Cache Fusion in Operation
+ /uic$ recap of )CS, a )CS resource can be local or or global , if it is local it can be acted upon without consulting other instances, if it is global it cannot be acted upon without consulting or informing remote instances )CS is used as a messaging agent to coordinate manipulation of a global resource 3y default all resources are in NG mode "remember null mode is used to convert from one type to another "share or e'clusive## !he table below denotes the different states of a resource ode!Role
"ocal
#lo$al
%ull &%'
N
N)
(hared &('
S
S)
)*clusive &+'
H
H) (tates
("
it can serve a copy of the bloc$ to other instances and i t can read the bloc$ from dis$, since the bloc$ is not modified there is no need to write to dis$
+"
it has sole ownership and interest in that resource, it has e'clusive right to modify the bloc$, all changes to the bloc$s are in the local buffer cache and it can write the bloc$ to the dis$ If another instance wants the bloc$ it can to come via the )CS
%"
used to protect consistent read bloc$, if an instance wants it in H mode, the current instance will send the bloc$ to the re/uesting 6
instance and downgrades its role to N (#
a bloc$ is present in one or more instances, an instance can read the read from dis$ and serve it to other instances
+#
a bloc$ can have one or more &Is, the instance with the H) role has the latest copy of the bloc$ and is the most li$ely candidate to write the bloc$ to the dis$ )CS can as$ the instance to write the bloc$ and serve it to other instances
%#
after discarding &Is when instructed to by )CS, the bloc$ is $ept in the buffer cache with N) role, this serves only as the CR copy of the bloc$
3elow are a number of common scenarios to help understand the following
•
reading from dis$ reading from cache
•
getting the bloc$ from cache for update
•
performing an update on a bloc$
•
performing an update on the same bloc$
•
reading a bloc$ that was globally dirty
•
performing a rollbac$ on a previously updated bloc$
•
reading the bloc$ after commit
•
(e will assume the following • •
our R+C environment "Instances +, 3, C and *# Instance * is the master of the loc$ resource for the data bloc$ 3 7
•
(e will only use one bloc$ and it will reside at SCN 762514
•
(e will use a three-letter code for the loc$ states o
first letter will indicate the loc$ mode - N J Null, S J Shared and H J >'clusive
o
second latter will indicate loc$ role - ) J )lobal, J ocal
o
!he third letter will indicate the &Is - 8 J no &Is, J a &I of the bloc
for e'ample a code of S8 means a global shared loc$ with no past images "&Is# Reading a $loc, from dis,
instance C want to read the bloc$ it will re/uest a loc$ in share mode from the master instance Instance Instance C re/uests re/uests the bloc$ bloc$ by sending sending a shared shared loc$ re/uest to master * . !he bloc$ bloc$ has never never been read into into the buffer buffer cache cache of any instance and it is not loc$ed Master * grants the loc$ to instance C !he loc$ granted is S8 "see above to wor$ out three-letter code# 0 Instance Instance C reads reads the bloc$ from from the shared shared dis$ into into its buffer cache 4 Instance Instance C has the the bloc$ in shard shard mode, mode, the loc$ manager manager updates the resource directory Reading a $loc, from the cache
8
Carrying on from the above e'ample, Instance 3 wants to read the same bloc$ that is cached in instance C buffer Instance Instance 3 sends sends a shared loc$ loc$ re/uest re/uest to master master instance instance * . !he loc$ master master $nows $nows that the the bloc$ may be be available available at instance C and sends a ping message to instance C 0 Instance Instance C sends sends the bloc$ bloc$ to instanc instance e 3 via the interconnect,, along with the bloc$ instance C indicates that interconnect instance 3 should ta$e the current loc$ mode and role from instance C, instance C $eeps a copy of the bloc$ 4 Instance Instance 3 sends sends a message to to instance instance * that it has has assumed the S loc$ for the bloc$ !his message is not critical for the loc$ manager, thus the message is sent asynchronously #etting a &Cached' clean $loc, for update
Carrying on from the above e'ample, instance + wants wants to modify the same bloc$ that is already cached in instance 3 and C "bloc$ 762514# Instance Instance + sends an e'clusiv e'clusive e loc$ re/uest re/uest to master master * . !he loc$ master master $nows $nows that the the bloc$ may be be available available at instance 3 in SCGR mode and at instance C in CR mode it also sends a ping message to the shared loc$ holders !he most recent access was at instance 3 and instance * sends a 3+S! 3+S! message message to instance 3 0 Instance Instance 3 sends the bloc$ bloc$ to instance instance + via via the interconnect interconnect and closes it shared loc$ !he bloc$ may still be in its buffer to be as CR, but all loc$s are released
9
4 Instance Instance + now has the the e'clusive e'clusive loc$ on the bloc$ bloc$ and sends an assume message to instance *, the loc$ is in H8 1 Instance Instance + modif modifies ies the bloc$ in its buffer buffer cache, cache, the changes changes are not committed and thus the bloc$ has not been written to dis$, thus the SCN remains at 762514 #etting a &Cached' modified $loc, for update and commit
Carrying on from the above e'ample, instance C now wants to modify the bloc$, if it tries to modify the same row it will have to wait until instance + either commits or rolls bac$ Kowever i n this case instance C wants to modify a different row in the same bloc$ Instance Instance C sends sends an e'clusive e'clusive loc$ re/uest re/uest to master master * . !he loc$ master master $nows $nows that instance instance + holds holds an e'clusive e'clusive loc$ on the bloc$ and hence sends a ping message to instance + 0 Instance Instance + sends the dirty dirty buffer buffer to instance instance C via the interconnect, it downgrades the loc$ from HCR to NG, it $eeps a &I version of the bloc$ and disowns any loc$ on that buffer 3efore shipping the bloc$, Instance + has to create a &I image and image and flush any pending redo for the bloc$ change, the bloc$ mode on instance + is now N) 4 Instance Instance C sends sends a message message to instance instance * indicating indicating it has has the bloc$ in e'clusive mode !he bloc$ role ) indicates that the bloc$ is in global mode and if it needs to write the bloc$ to dis$ it must coordinate it with other instances that have past images "&Is# of that bloc$ Instance C modifies the bloc$ and issues a commit, the SCN is now 762558 Commit the previously modified $loc, and select the data 10
Carrying on from the above e'ample, instance + now issues a commit to release the row level loc$s held by the transaction and flush the redo information to the redologs Instance Instance + wants wants to commit the changes, changes, commit commit operations operations do not re/uire any synchronous modifications to the bloc$ . !he loc$ status status remains remains the same same as the previous previous state state and change vectors for the commits are written to the redologs
-rite the dirty $uffers to dis, due to a chec,point
Carrying on from the above e'ample, instance 3 writes the dirty bloc$s from the buffer cache due to a chec$point "this is were it gets interesting and very clever# Instance Instance 3 sends sends a write re/uest re/uest to master master * with the necessary SCN . !he master master $nows $nows that the most most recent copy copy of the bloc$ bloc$ may be available at instance C and hence sends a message to instance C as$ing to write 0 Instance Instance C initiates initiates a dis$ write write and writes writes a 3(R into into the redolog file 4 Insta Instance nce C get the write write notification notification that the the write is complete complete 1 Insta Instance nce C notifies notifies the master master that that the write write is completed completed 5 On receipt receipt of the notificat notification, ion, instance instance * tells tells all &I holders holders to discard their &Is, and the loc$ at instance C writes the 11
modified bloc$ to the dis$ 2 +ll instances instances that have have previously previously modified modified this bloc$ will will also have to write a 3(R !he write re/uest by instance C has now been satisfied and instance C can now proceed with its chec$point as usual aster instance crashes
Carrying on from the above e'ample the maste masterr instanc instance e * crashe crashes s 2.
!he )lobal Resource *irectory is froFen momentarily and the resources held by master instance * will be e/ually distributed in the surviving nodes, also $now as remastering "see remastering remastering for for more details#
(elect the rows from Instance A
12
Carrying on from the above e'ample, now instance + /ueries the rows from that table to get the most recent data Instance Instance + sends a shared shared loc$ to now now the new new master master instance C . Maste Masterr C $nows the most most recent recent copy of the bloc$ bloc$ may be in instance C and as$s the holder to ship the CR bloc$ to instance + 0 Instance Instance C ships ships the CR bloc$ bloc$ to instance instance + via the interconnect
!he above se/uence of events can be seen in the table below )*ample
Operation on %ode A
B
update the bloc$ update the same bloc$
2
4
.
A
B
commit the changes trigger chec$point
C
.
SCGR
read the bloc$ from cache
0
3
C
read bloc$ from dis$
/
1
Buffer (tatus
CR
SCGR
HCGR
CR
CR
&I
CR
HCGR
&I
CR
HCGR
CR
HCGR
13
instance crash
5 6
select the rows
CR
HCGR
14