1. Overview This document contains a straightforward guide on setting up standby databases for maximum protection, using command line mode, and avoiding using the GUI. To do this Oracle9i has a feature called DataGuard and the following sections describe the tasks undertaken to set-up primary and standby servers and a couple of UNIX servers. Database DWH0P01 is replicated from nodeA to nodeB via DataGuard This document also details the management procedures required and the standard processes that are required to maintain both production and standby databases.
1. 2.
Overview___________________________________________________________ 1 Configuration_______________________________________________________ 2 2.1. 2.2. 2.3. 2.4.
3.
2 2 3 4
Setting It Up________________________________________________________ 5 3.1. 3.2. 3.3. 3.4.
4.
Listener.ora ___________________________________________________________ tnsnames.ora __________________________________________________________ Pfile _______________________________________________________________ ___ Directory Structure _____________________________________________________
Archive Files ___________________________________________________________ 5 ___________________ _____________ _____________ ______________ ________ _6 Query V$MANAGED_STANDBY ____________ Log files to check on both systems _____________ ____________________ _____________ _____________ _____________ ________ __ 7 Post Set-up ____________________________________________________________ 7
Database Maintenance _______________________________________________ 8 4.1. 4.2. 4.3. 4.4. 4.5. 4.6.
____________________ ______________ ______________ ________ _8 Cancel/Stop Managed Standby Recovery_____________ Activating a Standby Database _____________ ____________________ _____________ _____________ _____________ __________ ____ 8 ____________________ ______________ _____________ _____________ _____________ ____________ ______ 8 Database Switchover _____________ Database Fail over ______________________________________________________ 9 ____________________ _____________ _____________ _____________ ________ __ 9 Automatic Archive Gap Detection _____________ ___________________ _____________ _____________ _____________ _____________ _______ _ 10 Delayed Redo Application _____________
1
2. Configuration
2.1. Listener.ora 2.1.1. listener.ora entry on nodeA LISTENER = (DESCRIPTION_LIST = (DESCRIPTION = (ADDRESS_LIST = (ADDRESS = (PROTOCOL = IPC)(KEY = DWH0P01)) ) (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = nodeA.ldn.eu.wm.ubs.com)(PORT = 1521)) ) ) )
2.1.2. listener.ora entry on nodeB LISTENER = (DESCRIPTION_LIST = (DESCRIPTION = (ADDRESS_LIST = (ADDRESS = (PROTOCOL = IPC)(KEY = DWH0P01)) ) (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = nodeB.fft.eu.wm.ubs.com)(PORT = 1521)) ) ) )
2.2. tnsnames.ora 2.2.1. tnsnames.ora entry on nodeA and nodeB # TNSNAMES.ORA Network Configuration File: /var/db/oracle/product/9.2.0/network/admin/tnsnames.ora # Generated by Oracle configuration tools. prod1.db.eu.wm.ubs.com = (DESCRIPTION = (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = nodeA.ldn.eu.wm.ubs.com)(PORT = 1521)) ) (CONNECT_DATA = (SERVICE_NAME = prod1.db.eu.wm.ubs.com) ) ) stby1.db.eu.wm.ubs.com = (DESCRIPTION = (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = nodeB.fft.eu.wm.ubs.com)(PORT = 1521)) ) (CONNECT_DATA = (SERVICE_NAME = stby1.db.eu.wm.ubs.com) ) )
2
2.3. Pfile 2.3.1. pfile for DWH0P01 on nodeA #include the common include file BEFORE the other parameters so that "common" parameters can be overridden ifile = /var/db/oracle/prod uct/9.2.0/dbs/common_ifil e.ora audit_trail = OS control_files = ("/var/db/oracle/ad min/DWH0P01/data/DWH0P01_ cntrl_01.ctl", "/var/db/oracle/admin/DWH0P01/data/DWH0P01_cntrl_02.ctl") db_name = DWH0P01 background_dump_dest = /var/db/oracle/admi n/DWH0P01/bdump user_dump_dest = /var/db/oracle/admi n/DWH0P01/udump core_dump_dest = /var/db/oracle/admi n/DWH0P01/cdump db_block_buffers = 6400 open_links = 255 processes = 85 shared_pool_size = 50M undo_management = AUTO undo_tablespace = UNDO undo_retention = 7200 SERVICE_NAMES = LOG_ARCHIVE_FORMAT = LOG_ARCHIVE_DEST_1 = REOPEN=30' LOG_ARCHIVE_DEST_2 = LOG_ARCHIVE_DEST_STATE_1= LOG_ARCHIVE_DEST_STATE_2= REMOTE_ARCHIVE_ENABLE = FAL_SERVER = FAL_CLIENT = STANDBY_ARCHIVE_DEST =
prod1 DWH0P01_%S.arc 'LOCATION=/var/db/o racle/admin/DWH0P01/archi ve MANDATORY 'SERVICE=stby1 LGWR SYNC AFFIRM' enable enable true prod1 stby1 /var/db/oracle/admi n/DWH0P01/standby
The LGWR SYNC AFFIRM keywords indicate that the Logwriter should synchronously write updates to the online redo logs to this location and wait for confirmation of the write before proceeding, if in maximum protection mode. The remote site will process and archive these standby redo logs to keep the databases synchronized. This whole process can impact performance greatly but provides maximum data security. However if communications are halted between sites, the Primary Database will continue and the Standby Database will re-sync when communications are re-established. If in maximum protection mode, the Primary database will halt. An easy test for this is to stop the listeners.
2.3.2. Pfile for DWH0P01 on nodeB #include the common include file BEFORE the other parameters so that "common" parameters can be overridden ifile = /var/db/oracle/prod uct/9.2.0/dbs/common_ifil e.ora audit_trail = OS control_files = ("/var/db/oracle/ad min/DWH0P01/pfile/DWH0P01 _standby.ctl") db_name = DWH0P01 background_dump_dest = /var/db/oracle/admi n/DWH0P01/bdump user_dump_dest = /var/db/oracle/admi n/DWH0P01/udump core_dump_dest = /var/db/oracle/admi n/DWH0P01/cdump db_block_buffers = 6400 open_links = 255 processes = 85 shared_pool_size = 50M undo_management = AUTO undo_tablespace = UNDO undo_retention = 7200 REMOTE_ARCHIVE_ENABLE SERVICE_NAMES
= true = stby1
LOCK_NAME_SPACE INSTANCE_NAME FAL_SERVER
= stby1 = stby1 = prod1
3
FAL_CLIENT
= stby1
STANDBY_ARCHIVE_DEST LOG_ARCHIVE_TRACE LOG_ARCHIVE_FORMAT STANDBY_FILE_MANAGEMENT REMOTE_ARCHIVE_ENABLE
= = = = =
/var/db/oracle/admi n/DWH0P01/standby 127 DWH0P01_%S.arc auto true
LOG_ARCHIVE_DEST_1 = 'LOCATION=/var/db/o racle/admin/DWH0P01/archi ve MANDATORY REOPEN=30' LOG_ARCHIVE_DEST_2 = 'SERVICE=prod1 LGWR SYNC AFFIRM' LOG_ARCHIVE_DEST_STATE_1=enable LOG_ARCHIVE_DEST_STATE_2=enable
If maximum protection is not required, then removing “LGWR SYNC A FFIRM” and changing it to say “REOPEN=60” in both of the pfiles’ will then just enable the standby database to receive and apply archive files when they are produced on the primary database.
2.4. Directory Structure On both systems, the same directory structure was set-up, i.e. nodeB(oracle):/var/db/oracle/admin/DWH0P01$ ls adump udump
archive bdump
cdump
create data
export pfile
standby
4
3. Setting It Up On nodeA, database DWH0P01 (primary database, service name prod1) 1. Ensure database is in ARCHIVELOG mode 2. Shutdown the database using SHUTDOWN IMMEDIATE 3. Copy all *.dbf files from the data directory on nodeA to the data directory on nodeB. Do not copy redo log files or control files; they aren’t needed 4. Re-start database 5. Create a standby controlfile - ALTER ALTER DATABASE CREATE STANDBY CONTROLFILE AS ‘/var/db/oracle/admin/DWH0 ‘/var/db/oracle/admin/DWH0P01/pfile/DWH P01/pfile/DWH0P01_standby.c 0P01_standby.ctl’ tl’
6.
Copy DWH0P01_standby.ctl DWH0P01_standby.ctl over to nodeB, to the directory indicated by the pfile on nodeB
On nodeB, database DWH0P01 (standby database, service name stby1) 1. startup nomount 2. alter database mount standby database 3. recover managed standby database disconnect from session
That’s it
3.1. Archive Files To check archive files are going, do a “alter system archive log current” on the primary database. An archive file should appear in /var/db/oracle/admin/DWH0P01/archive on nodeA and something like the following should appear in the alert logfile Creating archive destination LOG_ARCHIVE_DEST_1: '/var/db/oracle/admin/DWH0P01/archive/DWH0P01_00000033.arc' ARCH: Completed archiving log 1 thread 1 sequence 33 Thu Nov 20 10:47:43 2003 ARC0: Begin FAL archive (thread 1 sequence 32 destination stby1) Creating archive destination LOG_ARCHIVE_DEST_2: 'stby1' ARC0: Complete FAL archive (thread 1 sequence 32 destination stby1)
On nodeB, the archive files should be in /var/db/oracle/admin/DWH0P01/standby, along with the current redo log, i.e. -rw-r-----
1 oracle
dba
19968 Nov 20 10:47 DWH0P01_00000033.arc
-rw-r-----
1 oracle
dba
104858112 Nov 20 10:51 DWH0P01_00000034.arc
DWH0P01_00000033.arc has just been flushed on nodeA and nodeB. DWH0P01_00000034.arc is the current synchronised redo log from nodeA.
5
3.2. Query V$MANAGED_STANDBY Query the physical standby database to monitor log apply and log transport services activity at the standby site. nodeB(oracle):/var/db/oracle/admin/DWH0P01/standby$
sqlplus
SQL*Plus: Release 9.2.0.4.0 - Production on Tue Dec 30 21:30:47 2003 Copyright (c) 1982, 2002, Oracle Corporation.
All rights reserved.
Enter user-name: / as sysdba Connected to: Oracle9i Enterprise Edition Release 9.2.0.4.0 - Production With the Partitioning, OLAP and Oracle Data Mining options JServer Release 9.2.0.4.0 - Production
SQL> SELECT PROCESS, STATUS, THREAD#, SEQUENCE#, BLOCK#, BLOCKS FROM V$MANAGED_STANDBY; PROCESS ------ARCH ARCH MRP0 RFS RFS RFS
STATUS THREAD# SEQUENCE# BLOCK# BLOCKS ------------ ---------- ---------- ---------- ---------CONNECTED 0 0 0 0 CONNECTED 0 0 0 0 WAIT_FOR_LOG 1 4205 0 0 RECEIVING 0 0 0 0 RECEIVING 1 3524 2445 2445 WRITING 1 4205 14947 20480
If we do the same query on the primary database nodeA(oracle):/var/db/oracle/admin/DWH0P01/pfile$
sqlplus
SQL*Plus: Release 9.2.0.4.0 - Production on Tue Dec 30 21:33:49 2003 Copyright (c) 1982, 2002, Oracle Corporation.
All rights reserved.
Enter user-name: / as sysdba Connected to: Oracle9i Enterprise Edition Release 9.2.0.4.0 - Production With the Partitioning, OLAP and Oracle Data Mining options JServer Release 9.2.0.4.0 - Production
SQL> SELECT PROCESS, STATUS, THREAD#, SEQUENCE#, BLOCK#, BLOCKS FROM V$MANAGED_STANDBY; PROCESS ------ARCH ARCH LGWR
STATUS THREAD# SEQUENCE# BLOCK# BLOCKS ------------ ---------- ---------- ---------- ---------CLOSING 1 4203 2049 124 CLOSING 1 4204 1 1551 WRITING 1 4205 14947 1
From the query on the primary database, we see the current sequence being written to in the redo log area is 4205, and on the standby database we also see the current archive log being applied is for sequence 4205. In the directory that receives archive files on the standby database, the file DWH0P01_0000004205.arc will exist and will be the same size as the redo log l og on the primary database. However the primary database will not have DWH0P01_0000004205.ar c as a file in the archive area, as a log switch will not have occurred yet, but both databases are synchronized at the same sequence and block blo ck number, 14947. The other RFS entries for the standby database are from a previous attempt and can be ignored.
6
Up-to-date details can be found at http://foglight.appl.eu.wm.ubs.com/DBCentral/html/standby_db_status.html
For more information on monitoring, check the Oracle site http://download-west.oracle.com/docs/cd/B10501_01/server.920/a96653/log_apply.htm#SBYDB0055
3.3. Log files to check on both systems On nodeA in the bdump directory, the alert log and files generated by lgwr and lnsx can be checked for any problems On nodeB in the bdump directory, the alert log and files generated by mrpx can be checked for any problems
3.4. Post Set-up Now that Data Guard is configured and running the primary database can be prevented from applying updates unless the update has been sent to a t least one standby location. Connect to the standby database and execute: ALTER DATABASE SET STANDBY DATABASE PROTECTED
This ensures maximum protection. If this is not required and slight data divergence is acceptable, then ignore the command above.
7
4. Database Maintenance 4.1. Cancel/Stop Managed Standby Recovery While connected to the standby database 1. ALTER DATABASE SET STANDBY DATABASE UNPROTECTED; 2. RECOVER MANAGED STANDBY DATABASE CANCEL; 3. ALTER DATABASE OPEN READ ONLY;
The database can subsequently be switched back to recovery mode as follows: Start-up managed recovery on standby database 1. CONNECT / AS SYSDBA 2. SHUTDOWN IMMEDIATE 3. STARTUP NOMOUNT 4. ALTER DATABASE MOUNT STANDBY DATABASE; 5. RECOVER MANAGED STANDBY DATABASE DISCONNECT FROM SESSION; 6. ALTER DATABASE SET STANDBY DATABASE PROTECTED;
4.2. Activating a Standby Database If the primary database is not available availa ble the standby database can be activated as a primary database using the following statements: Cancel recovery if necessary on standby database databa se 1. RECOVER MANAGED STANDBY DATABASE CANCEL; 2. ALTER DATABASE ACTIVATE STANDBY DATABASE;
Since the standby database is now the primary database it should be backed up immediately. The previous primary database can then be configured as a standby. 1. Backup Standby Database
Backups of the standby database can only be performed if the database is shut down or in read only mode. Read only mode is i s best for managed recovery systems, as archive logs will still be transferred during the backup process, thus preventing gap sequences. Once the server is in the desired mode simply copy the appropriate database files.
4.3. Database Switchover A database can be in one of two mutually exclusive modes (primary or standby). These roles can be altered at runtime without loss of data or resetting of redo logs. This process is known as a Switchover and can be performed using the following statements: While connected to the primary database, issue the following commands:-
8
1. 2. 3. 4. 5. 6.
CONNECT / AS SYSDBA ALTER DATABASE COMMIT TO SWITCHOVER TO STANDBY; SHUTDOWN IMMEDIATE; STARTUP NOMOUNT ALTER DATABASE MOUNT STANDBY DATABASE; RECOVER MANAGED STANDBY DATABASE DISCONNECT FROM SESSION;
Now the original Primary database is in Standby mode and wai ting for the new Primary database to activate, which is done while connected to the standby database (not the original primary) 1. 2. 3. 4.
CONNECT / AS SYSDBA ALTER DATABASE COMMIT TO SWITCHOVER TO PRIMARY; SHUTDOWN IMMEDIATE; STARTUP
This process has no affect on alternative standby locations. The process of converting the instances back to their original roles is known as a Switchback. The switchback is accomplished by performing another switchover.
4.4. Database Fail over Graceful Database Fail over occurs when database fail over causes a standby database to be converted to a primary database: ALTER DATABASE RECOVER MANAGED STANDBY DATABASE FINISH; ALTER DATABASE ACTIVATE STANDBY DATABASE;
This process will recovery all or some of the application data using the standby redo logs, therefore avoiding reinstantiation of other standby databases. If completed successfully, successfully, only the primary database will need to be reinstatiated as a standby database. Forced Database Failover changes one standby database to a primary database. Application data may be lost necessitating the reinstantiation of the primary and all standby databases.
4.5. Automatic Archive Gap Detection Gaps in the sequence of archive logs l ogs can be created when changes are applied to the primary database while the standby database is unavailable. Most of these gap sequences can be resolved automatically. The following parameters must be added to the standby init.ora file where the values val ues indicate net services names. FAL_SERVER = 'prod1' FAL_CLIENT = 'stby1'
The FAL server is normally the primary database, but can be another standby database. Once the standby database is placed in managed recovery mode it will automatically check for gap sequences. If it finds any it will request the appropriate files from the primary database via the FAL server. If the gap sequences cannot be resolved the files have to be recovered manually. This can be done by retrieving missing archive files to the archive directory on the Primary server.
9
4.6. Delayed Redo Application Application of the archived redo logs to the standby database can be delayed using the DELAY keyword. If a rogue statement significantly damages the primary database the DBA can choose to switch to the standby database, which will be in the correct state prior to the problem on the primary database. On the primary database, issue i ssue the following commands: •
Delay application of archived redo logs by 30 minutes.
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE DELAY 30 ; •
Return to no delay (Default).
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE NODELAY ;
10