understanding instance recovery in rac

Understanding Instance Recovery in RAC – Understanding Cache Fusion in RAC

• •

Crash Recovery - all instances have failed Instance Recovery - one instance has failed

In both cases the threads from failed instances need to be merged, in a instance recovery SMON will perform the recovery where as in a crash recovery a foreground process performs the recovery !he main features "advantages# of cache fusion recovery are

•

Recovery cost is proportional to the number of failures, not the total number of nodes It eliminates dis$ reads of bloc$s that are present in a surviving i nstance%s cache

•

It prunes recovery set based on the global resource loc$ state

•

!he cluster is available after an initial log scan, even before recovery reads are complete

•

In cache fusion the starting point for recovery of a bloc$ is its most current &I version, this could be located on any of the surviving instances and multiple &I bloc$s of a particular buffer can e'ist Remastering is the term used that describes the operation whereby a node attempting recovery tries to own or master the resource"s# that were once mastered by another instance prior to the failure (hen one instance leaves the cluster, the )R* of that instance needs to be redistributed to the surviving nodes R+C uses an algorithm called lazy remastering to to remaster only a minimal number of resources during a reconfiguration !he entire &arallel Cache Management "&CM# loc$ space remains invalid while the *M and SMON complete the below steps

 I*M master node node discards loc$s that are held by dead instances, the space is reclaimed by this operation is used to remaster loc$s that are held by the surviving instance for which a dead instance was remastered . SMON issues a message saying that it has ac/uired ac/uired the necessary buffer loc$s to perform recovery recovery 1

ets loo$ at an e'ample on what happens during a remastering, lets presume the following

•

Instance + masters masters resources , 0, 1 and 2 Instance 3 masters resources ., 4, 5, and 6

•

Instance C masters resources 7, 8,  and .

•

Instance 3 is removed from the cluster, only the resources from instance 3 are evenly remastered across the surviving nodes "no resources on instances + and C are affected#, this reduces the amount of wor$ the R+C has to perform, li$ewise when a instance 9oins a cluster only minimum amount of resources are remastered to the new instance instance Before Remastering

After Remastering

:ou :ou can control the remastering process with a number of parameters _gcs_fast_config _gcs_fast_con fig

enables fast reconfiguration for gcs loc$s "true;false#

_lm_master_weight _lm_master_wei ght

controls which instance will hold or "re#master more resources than others

_gcs_resources

controls the number of resources an instance will master at a time

you can also force a dynamic remastering "*RM# of an ob9ect using oradebug 2

<< Obtain the O3=>C!?I* form the below table S@A select B from vgcspfmaster?infoD force dynamic remastering "*RM#

<< *etermine who masters it S@A oradebug setmypid S@A oradebug l$debug -a EO3=>C!?I*A << Now remaster the resource S@A oradebug setmypid S@A oradebug l$debug -m p$ey EO3=>C!?I*A

!he steps of a )R* reconfiguration is as follows

•

Instance death is detected by the c luster manager Re/uest for &CM loc$s are froFen

•

>n/ueues are reconfigured and made available

•

*M recovery

•

)CS "&CM loc$# is remastered

•

&ending writes and notifications are processed

•

I &ass recovery

•

•

o

!he instance recovery "IR# loc$ is ac/uired by SMON

o

!he recovery set is prepared and built, memory space is allocated in the SMON &)+

o

SMON ac/uires loc$s on buffers that need recovery

II &ass recovery

3

o

II pass recovery is initiated, database is partially available

o

3loc$s are made available as they are recovered

o

!he IR loc$ is released by SMON, recovery is then complete

o

!he system is available

)raphically it loo$s li$e below

4

5

Cache Fusion in Operation

+ /uic$ recap of )CS, a )CS resource can be local or or global , if it is local it can be acted upon without consulting other instances, if it is global it cannot be acted upon without consulting or informing remote instances )CS is used as a messaging agent to coordinate manipulation of a global resource 3y default all resources are in NG mode "remember null mode is used to convert from one type to another "share or e'clusive## !he table below denotes the different states of a resource ode!Role

"ocal

#lo$al

%ull &%'

N

N)

(hared &('

S

S)

)*clusive &+'

H

H) (tates

("

it can serve a copy of the bloc$ to other instances and i t can read the bloc$ from dis$, since the bloc$ is not modified there is no need to write to dis$

+"

it has sole ownership and interest in that resource, it has e'clusive right to modify the bloc$, all changes to the bloc$s are in the local buffer cache and it can write the bloc$ to the dis$ If another instance wants the bloc$ it can to come via the )CS

%"

used to protect consistent read bloc$, if an instance wants it in H mode, the current instance will send the bloc$ to the re/uesting 6

instance and downgrades its role to N (#

a bloc$ is present in one or more instances, an instance can read the read from dis$ and serve it to other instances

+#

a bloc$ can have one or more &Is, the instance with the H) role has the latest copy of the bloc$ and is the most li$ely candidate to write the bloc$ to the dis$ )CS can as$ the instance to write the bloc$ and serve it to other instances

%#

after discarding &Is when instructed to by )CS, the bloc$ is $ept in the buffer cache with N) role, this serves only as the CR copy of the bloc$

3elow are a number of common scenarios to help understand the following

•

reading from dis$ reading from cache

•

getting the bloc$ from cache for update

•

performing an update on a bloc$

•

performing an update on the same bloc$

•

reading a bloc$ that was globally dirty

•

performing a rollbac$ on a previously updated bloc$

•

reading the bloc$ after commit

•

(e will assume the following • •

our R+C environment "Instances +, 3, C and *# Instance * is the master of the loc$ resource for the data bloc$ 3 7

•

(e will only use one bloc$ and it will reside at SCN 762514

•

(e will use a three-letter code for the loc$ states o

first letter will indicate the loc$ mode - N J Null, S J Shared and H J >'clusive

o

second latter will indicate loc$ role - ) J )lobal,  J ocal

o

!he third letter will indicate the &Is - 8 J no &Is,  J a &I of the bloc

for e'ample a code of S8 means a global shared loc$ with no past images "&Is# Reading a $loc, from dis,

instance C want to read the bloc$ it will re/uest a loc$ in share mode from the master instance  Instance Instance C re/uests re/uests the bloc$ bloc$ by sending sending a shared shared loc$ re/uest to master * . !he bloc$ bloc$ has never never been read into into the buffer buffer cache cache of any instance and it is not loc$ed Master * grants the loc$ to instance C !he loc$ granted is S8 "see above to wor$ out three-letter code# 0 Instance Instance C reads reads the bloc$ from from the shared shared dis$ into into its buffer cache 4 Instance Instance C has the the bloc$ in shard shard mode, mode, the loc$ manager manager updates the resource directory Reading a $loc, from the cache

8

Carrying on from the above e'ample, Instance 3 wants to read the same bloc$ that is cached in instance C buffer  Instance Instance 3 sends sends a shared loc$ loc$ re/uest re/uest to master master instance instance * . !he loc$ master master $nows $nows that the the bloc$ may be be available available at instance C and sends a ping message to instance C 0 Instance Instance C sends sends the bloc$ bloc$ to instanc instance e 3 via the interconnect,, along with the bloc$ instance C indicates that interconnect instance 3 should ta$e the current loc$ mode and role from instance C, instance C $eeps a copy of the bloc$ 4 Instance Instance 3 sends sends a message to to instance instance * that it has has assumed the S loc$ for the bloc$ !his message is not critical for the loc$ manager, thus the message is sent asynchronously #etting a &Cached' clean $loc, for update

Carrying on from the above e'ample, instance + wants wants to modify the same bloc$ that is already cached in instance 3 and C "bloc$ 762514#  Instance Instance + sends an e'clusiv e'clusive e loc$ re/uest re/uest to master master * . !he loc$ master master $nows $nows that the the bloc$ may be be available available at instance 3 in SCGR mode and at instance C in CR mode it also sends a ping message to the shared loc$ holders !he most recent access was at instance 3 and instance * sends a 3+S! 3+S! message message to instance 3 0 Instance Instance 3 sends the bloc$ bloc$ to instance instance + via via the interconnect interconnect and closes it shared loc$ !he bloc$ may still be in its buffer to be as CR, but all loc$s are released

9

4 Instance Instance + now has the the e'clusive e'clusive loc$ on the bloc$ bloc$ and sends an assume message to instance *, the loc$ is in H8 1 Instance Instance + modif modifies ies the bloc$ in its buffer buffer cache, cache, the changes changes are not committed and thus the bloc$ has not been written to dis$, thus the SCN remains at 762514 #etting a &Cached' modified $loc, for update and commit

Carrying on from the above e'ample, instance C now wants to modify the bloc$, if it tries to modify the same row it will have to wait until instance + either commits or rolls bac$ Kowever i n this case instance C wants to modify a different row in the same bloc$  Instance Instance C sends sends an e'clusive e'clusive loc$ re/uest re/uest to master master * . !he loc$ master master $nows $nows that instance instance + holds holds an e'clusive e'clusive loc$ on the bloc$ and hence sends a ping message to instance + 0 Instance Instance + sends the dirty dirty buffer buffer to instance instance C via the interconnect, it downgrades the loc$ from HCR to NG, it $eeps a &I version of the bloc$ and disowns any loc$ on that buffer 3efore shipping the bloc$, Instance + has to create a &I image and image and flush any pending redo for the bloc$ change, the bloc$ mode on instance + is now N) 4 Instance Instance C sends sends a message message to instance instance * indicating indicating it has has the bloc$ in e'clusive mode !he bloc$ role ) indicates that the bloc$ is in global mode and if it needs to write the bloc$ to dis$ it must coordinate it with other instances that have past images "&Is# of that bloc$ Instance C modifies the bloc$ and issues a commit, the SCN is now 762558 Commit the previously modified $loc, and select the data 10

Carrying on from the above e'ample, instance + now issues a commit to release the row level loc$s held by the transaction and flush the redo information to the redologs  Instance Instance + wants wants to commit the changes, changes, commit commit operations operations do not re/uire any synchronous modifications to the bloc$ . !he loc$ status status remains remains the same same as the previous previous state state and change vectors for the commits are written to the redologs

-rite the dirty $uffers to dis, due to a chec,point

Carrying on from the above e'ample, instance 3 writes the dirty bloc$s from the buffer cache due to a chec$point "this is were it gets interesting and very clever#  Instance Instance 3 sends sends a write re/uest re/uest to master master * with the necessary SCN . !he master master $nows $nows that the most most recent copy copy of the bloc$ bloc$ may be available at instance C and hence sends a message to instance C as$ing to write 0 Instance Instance C initiates initiates a dis$ write write and writes writes a 3(R into into the redolog file 4 Insta Instance nce C get the write write notification notification that the the write is complete complete 1 Insta Instance nce C notifies notifies the master master that that the write write is completed completed 5 On receipt receipt of the notificat notification, ion, instance instance * tells tells all &I holders holders to discard their &Is, and the loc$ at instance C writes the 11

modified bloc$ to the dis$ 2 +ll instances instances that have have previously previously modified modified this bloc$ will will also have to write a 3(R !he write re/uest by instance C has now been satisfied and instance C can now proceed with its chec$point as usual aster instance crashes

Carrying on from the above e'ample  the maste masterr instanc instance e * crashe crashes s 2.

!he )lobal Resource *irectory is froFen momentarily and the resources held by master instance * will be e/ually distributed in the surviving nodes, also $now as remastering "see remastering remastering for for more details#

(elect the rows from Instance A

12

Carrying on from the above e'ample, now instance + /ueries the rows from that table to get the most recent data  Instance Instance + sends a shared shared loc$ to now now the new new master master instance C . Maste Masterr C $nows the most most recent recent copy of the bloc$ bloc$ may be in instance C and as$s the holder to ship the CR bloc$ to instance + 0 Instance Instance C ships ships the CR bloc$ bloc$ to instance instance + via the interconnect

!he above se/uence of events can be seen in the table below )*ample

Operation on %ode A

B

update the bloc$ update the same bloc$

2

4

.

A

B

commit the changes trigger chec$point

C

.

SCGR

read the bloc$ from cache

0

3

C

read bloc$ from dis$

/

1

Buffer (tatus

CR

SCGR

HCGR

CR

CR

&I

CR

HCGR

&I

CR

HCGR

CR

HCGR

13

instance crash

5 6

select the rows

CR

HCGR

14

understanding instance recovery in rac

Recommend Documents