Tuesday, May 28, 2013

Resolving standby sync issue in loss of archivelogs at Primary database server.

Some time it might happens that Oracle process generates lots of archive log due to heavy sql operations and then we met with disk space alert. In that scenario we have to delete the archive logs from the database server in order to gain some space otherwise database will go in hung state. 
But sometime it happens that you might have deleted those archive logs which have not shipped yet over the standby server. In that scenario FAL(fetch Archive Log) will constantly try to fetch the missing archive log but it won't be succeed as the archive logs are not physically present over the primary database server. And it will not apply the next available archive logs into memory until it find the missing archive log.

Solution.

--> To deal with this situation you need to take the incremental SCN backup of the primary database and then create a standby control file for the standby database. 
--> Ship those backups to the standby database server.
--> Open the standby database in mount mode and perform the database recovery with the help of those backup sets.
--> Now open the database and start the managed recovery process.

At Primary database. 

SQL> conn scott/tiger
Connected.
SQL> insert into test
  2  select * from emp;
14 rows created.

SQL> commit;
Commit complete.

SQL> select count(*) from test;

  COUNT(*)
----------
        70

SQL> conn /as sysdba
Connected.
SQL> alter system switch logfile;
System altered.

SQL> select current_scn from v$database;

CURRENT_SCN
-----------
    2984319

SQL> select scn_to_timestamp(2984319) from dual;

SCN_TO_TIMESTAMP(2984319)
---------------------------------------------------------------------------
27-MAY-13 06.08.24.000000000 PM

Lets remove one archive log from the primary host server.

[oracle@oracle arch]$ ls -al
total 51264
-rw-r----- 1 oracle oinstall   209920 May 27 12:17 1_190_812203684.dbf
-rw-r----- 1 oracle oinstall    64512 May 27 12:18 1_191_812203684.dbf
-rw-r----- 1 oracle oinstall   176640 May 27 12:20 1_192_812203684.dbf
-rw-r----- 1 oracle oinstall   431104 May 27 12:22 1_193_812203684.dbf
-rw-r----- 1 oracle oinstall 30054912 May 27 16:22 1_194_812203684.dbf
-rw-r----- 1 oracle oinstall   181760 May 27 16:25 1_195_812203684.dbf
-rw-r----- 1 oracle oinstall     5632 May 27 16:25 1_196_812203684.dbf
-rw-r----- 1 oracle oinstall     4608 May 27 16:25 1_197_812203684.dbf
-rw-r----- 1 oracle oinstall 11155968 May 27 17:53 1_198_812203684.dbf
-rw-r----- 1 oracle oinstall     3072 May 27 17:53 1_199_812203684.dbf
-rw-r----- 1 oracle oinstall  1518080 May 27 18:00 1_200_812203684.dbf
-rw-r----- 1 oracle oinstall     3584 May 27 18:00 1_201_812203684.dbf
[oracle@oracle arch]$ pwd
/u01/app/oracle/admin/db1/arch
[oracle@oracle arch]$ mv 1_201_812203684.dbf ../arch_bkp/
[oracle@oracle arch]$ ls -al
total 51260
-rw-r----- 1 oracle oinstall   209920 May 27 12:17 1_190_812203684.dbf
-rw-r----- 1 oracle oinstall    64512 May 27 12:18 1_191_812203684.dbf
-rw-r----- 1 oracle oinstall   176640 May 27 12:20 1_192_812203684.dbf
-rw-r----- 1 oracle oinstall   431104 May 27 12:22 1_193_812203684.dbf
-rw-r----- 1 oracle oinstall 30054912 May 27 16:22 1_194_812203684.dbf
-rw-r----- 1 oracle oinstall   181760 May 27 16:25 1_195_812203684.dbf
-rw-r----- 1 oracle oinstall     5632 May 27 16:25 1_196_812203684.dbf
-rw-r----- 1 oracle oinstall     4608 May 27 16:25 1_197_812203684.dbf
-rw-r----- 1 oracle oinstall 11155968 May 27 17:53 1_198_812203684.dbf
-rw-r----- 1 oracle oinstall     3072 May 27 17:53 1_199_812203684.dbf  <-- archive log 201 is missing
-rw-r----- 1 oracle oinstall  1518080 May 27 18:00 1_200_812203684.dbf

SQL> alter system switch logfile;
System altered.

[oracle@oracle arch]$ ls -al
total 51612
-rw-r----- 1 oracle oinstall   209920 May 27 12:17 1_190_812203684.dbf
-rw-r----- 1 oracle oinstall    64512 May 27 12:18 1_191_812203684.dbf
-rw-r----- 1 oracle oinstall   176640 May 27 12:20 1_192_812203684.dbf
-rw-r----- 1 oracle oinstall   431104 May 27 12:22 1_193_812203684.dbf
-rw-r----- 1 oracle oinstall 30054912 May 27 16:22 1_194_812203684.dbf
-rw-r----- 1 oracle oinstall   181760 May 27 16:25 1_195_812203684.dbf
-rw-r----- 1 oracle oinstall     5632 May 27 16:25 1_196_812203684.dbf
-rw-r----- 1 oracle oinstall     4608 May 27 16:25 1_197_812203684.dbf
-rw-r----- 1 oracle oinstall 11155968 May 27 17:53 1_198_812203684.dbf
-rw-r----- 1 oracle oinstall     3072 May 27 17:53 1_199_812203684.dbf
-rw-r----- 1 oracle oinstall  1518080 May 27 18:00 1_200_812203684.dbf
-rw-r----- 1 oracle oinstall   353792 May 27 18:02 1_202_812203684.dbf

At Standby server.

SQL> select * from v$archive_gap;

   THREAD# LOW_SEQUENCE# HIGH_SEQUENCE#
---------- ------------- --------------
         1           201            201

SQL> select thread#, max(sequence#) "Last Standby Seq Received"  from v$archive                                                                              group by thread# order by 1;

   THREAD# Last Standby Seq Received
---------- -------------------------
         1                       202

SQL> select thread#, max(sequence#) "Last Standby Seq Applied" from v$archived_                                                                              d val.applied in ('YES','IN-MEMORY')  group by thread# order by 1;

   THREAD# Last Standby Seq Applied
---------- ------------------------
         1                      200
SQL> select current_scn from v$database;

CURRENT_SCN
-----------
    2983499

SQL> select scn_to_timestamp(2983499) from dual;

SCN_TO_TIMESTAMP(2983499)
---------------------------------------------------------------------------
27-MAY-13 05.58.59.000000000 PM

From above we can see that log sequence 201 is not reached to the standby server due to this managed recovery process is unable to apply the next available sequence i.e.202.

Now take the incremental SCN backup and create the standby control file from the primary database.

At Primary Database.

RMAN> backup incremental from scn 2983499 database format '/u01/stby/%U.bkp';

Starting backup at 27-MAY-13

using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=29 device type=DISK
backup will be obsolete on date 03-JUN-13
archived logs will not be kept or backed up
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00001 name=/u01/app/oracle/oradata/db1/system01.dbf
input datafile file number=00002 name=/u01/app/oracle/oradata/db1/sysaux01.dbf
input datafile file number=00005 name=/u01/app/oracle/oradata/db1/undotbs02.dbf
input datafile file number=00004 name=/u01/app/oracle/oradata/db1/users01.dbf
channel ORA_DISK_1: starting piece 1 at 27-MAY-13
channel ORA_DISK_1: finished piece 1 at 27-MAY-13
piece handle=/u01/stby/1hoan0gm_1_1.bkp tag=TAG20130527T182526 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:56

using channel ORA_DISK_1
backup will be obsolete on date 03-JUN-13
archived logs will not be kept or backed up
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
including current control file in backup set
channel ORA_DISK_1: starting piece 1 at 27-MAY-13
channel ORA_DISK_1: finished piece 1 at 27-MAY-13
piece handle=/u01/stby/1ioan0ie_1_1.bkp tag=TAG20130527T182526 comment=NONE
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:01
Finished backup at 27-MAY-13

SQL> alter database create standby controlfile as '/u01/stby/stby.ctl';
Database altered.

[oracle@oracle u01]$ scp -r stby/ oracle@test:/u01
oracle@test's password:
1ioan0ie_1_1.bkp                                                                                                           100% 8512KB   8.3MB/s   00:00
stby.ctl                                                                                                                            100% 8464KB   8.3MB/s   00:01
1hoan0gm_1_1.bkp                                                                                                        100% 3896KB   3.8MB/s   00:00

At Standby database.

Standby control files shows at two location in parameter file. Replace those two control files with the shipped  standby control file.

[oracle@test dbs]$ more initdb1.ora
*.audit_file_dest='/u01/app/oracle/admin/db1/adump'
*.audit_trail='db'
*.compatible='11.2.0.0.0'
*.control_files='/u01/app/oracle/oradata/db1/control01.ctl','/u01/app/oracle/fast_recovery_area/db1/control02.ctl'

[oracle@test stby]$ cp stby.ctl /u01/app/oracle/oradata/db1/control01.ctl
[oracle@test stby]$ cp stby.ctl /u01/app/oracle/fast_recovery_area/db1/control02.ctl

Now open the standby database in mount mode and do the recover.

SQL> startup mount pfile='/u01/app/oracle/product/11.2.0/db_1/dbs/initdb1.ora';
ORACLE instance started.

Total System Global Area  171581440 bytes
Fixed Size                  1343668 bytes
Variable Size             117444428 bytes
Database Buffers           50331648 bytes
Redo Buffers                2461696 bytes
Database mounted.

[oracle@test dbs]$ rman target /
connected to target database: DB1 (DBID=1406063903, not open)

RMAN> catalog start with '/u01/stby';

Starting implicit crosscheck backup at 27-MAY-13
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=29 device type=DISK
Crosschecked 7 objects
Finished implicit crosscheck backup at 27-MAY-13

Starting implicit crosscheck copy at 27-MAY-13
using channel ORA_DISK_1
Crosschecked 4 objects
Finished implicit crosscheck copy at 27-MAY-13

searching for all files in the recovery area
cataloging files...
cataloging done

List of Cataloged Files
=======================
File Name: /u01/app/oracle/fast_recovery_area/SDB1/controlfile/o1_mf_8t6f0dls_.ctl
File Name: /u01/app/oracle/fast_recovery_area/SDB1/1boamb0g_1_1.bkp
File Name: /u01/app/oracle/fast_recovery_area/SDB1/1doamb3o_1_1.bkp
File Name: /u01/app/oracle/fast_recovery_area/SDB1/1goamb7e_1_1.bkp

searching for all files that match the pattern /u01/stby

List of Files Unknown to the Database
=====================================
File Name: /u01/stby/stby.ctl

Do you really want to catalog the above files (enter YES or NO)? yes
cataloging files...
cataloging done

List of Cataloged Files
=======================
File Name: /u01/stby/stby.ctl

RMAN> recover database;

Starting recover at 27-MAY-13
using channel ORA_DISK_1
channel ORA_DISK_1: starting incremental datafile backup set restore
channel ORA_DISK_1: specifying datafile(s) to restore from backup set
destination for restore of datafile 00001: /u01/app/oracle/oradata/db1/system01.dbf
destination for restore of datafile 00002: /u01/app/oracle/oradata/db1/sysaux01.dbf
destination for restore of datafile 00004: /u01/app/oracle/oradata/db1/users01.dbf
destination for restore of datafile 00005: /u01/app/oracle/oradata/db1/undotbs02.dbf
channel ORA_DISK_1: reading from backup piece /u01/stby/1hoan0gm_1_1.bkp
channel ORA_DISK_1: piece handle=/u01/stby/1hoan0gm_1_1.bkp tag=TAG20130527T182526
channel ORA_DISK_1: restored backup piece 1
channel ORA_DISK_1: restore complete, elapsed time: 00:00:03

starting media recovery

archived log for thread 1 with sequence 205 is already on disk as file /u01/app/oracle/admin/db1/arch/1_205_812203684.dbf
archived log file name=/u01/app/oracle/admin/db1/arch/1_205_812203684.dbf thread=1 sequence=205
unable to find archived log
archived log thread=1 sequence=206
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of recover command at 05/27/2013 18:29:42
RMAN-06054: media recovery requesting unknown archived log for thread 1 with sequence 206 and starting SCN of 2986916

SQL> alter database open;
Database altered.
SQL> recover managed standby database disconnect;
Media recovery complete.
SQL> conn scott/tiger
Connected.
SQL> select count(*) from test;

  COUNT(*)
----------
        70

SQL> conn /as sysdba
Connected.
SQL> select * from v$archive_gap;
no rows selected

SQL> select max(sequence#) from v$archived_log;

MAX(SEQUENCE#)
--------------
           205

SQL> select thread#, max(sequence#) "Last Standby Seq Applied" from v$archived_                                                                              d val.applied in ('YES','IN-MEMORY')  group by thread# order by 1;

   THREAD# Last Standby Seq Applied
---------- ------------------------
         1                      205

SQL> select thread#, max(sequence#) "Last Standby Seq Received"  from v$archive                                                                              group by thread# order by 1;

   THREAD# Last Standby Seq Received
---------- -------------------------
         1                       205

At Primary DB.

SQL> select max(sequence#) from v$archived_log;

MAX(SEQUENCE#)
--------------
           205

SQL> select thread#, max(sequence#) "Last Primary Seq Generated"  from v$archived_log val, v$database vdb where val.resetlogs_change# = vdb.resetlogs_change# group by thread# order by 1;

   THREAD# Last Primary Seq Generated
---------- --------------------------
         1                        205

Thanks for your time.  :) 

1 comment:

  1. Dear Rohit,

    Delete archives is not the only way to avoid what you called "hung state". One is temporarily change the value of intialisation parameter log_archive_dest_1 to a filesystem/diskgroup with enough space. After that backup archive logs with RMAN will free space automatically. This will gain you some time to find the root cause of excesive archive log generation and fix the problem.

    Regards

    ReplyDelete