Data Recovery Advisor
Consider the error shown below:
SQL> conn scott/tiger Connected. SQL> create table t (col1 number); create table t (col1 number) * ERROR at line 1: ORA-01116: error in opening database file 4 ORA-01110: data file 4: '/home/oracle/oradata/PRODB3/users01.dbf' ORA-27041: unable to open file Linux Error: 2: No such file or directory Additional information: 3
Does it look familiar? Regardless of your experience as a DBA, you probably have seen this message more than once. This error occurs because the datafile in question is not available—it could be corrupt or perhaps someone removed the file while the database was running. In any case, you need to take some proactive action before the problem has a more widespread impact. In Oracle Database 11g, the new Data Recovery Advisor makes this operation much easier. The advisor comes in two flavors: command line mode and as a screen in Oracle Enterprise Manager Database Control. Each flavor has its advantages for a given specific situation. For instance, the former option comes in handy when you want to automate the identification of such files via shell scripting and schedule recovery through a utility such as cron or at. The latter route is helpful for novice DBAs who might want the assurance of a GUI that guides them through the process. I'll describe both here.
Command Line Option
The command line option is executed through RMAN. First, start the RMAN process and connect to the target.
$ rman target=/ Recovery Manager: Release 11.1.0.5.0 - Beta on Sun Jul 15 19:43:45 2007 Copyright (c) 1982, 2007, Oracle. All rights reserved.
connected to target database: PRODB3 (DBID=3132722606)
Assuming that some error has occurred, you want to find out what happened. The list failure command tells you that in a jiffy.
RMAN> list failure;
If there is no error, this command will come back with the message: If there is an error, a more explanatory message will follow:
no failures found that match specification using target database control file instead of recovery catalog List of Database Failures ========================= Failure ID Priority Status Time Detected Summary ---------- -------- --------- ------------- ------142 HIGH OPEN 15-JUL-07 One or more non-system datafiles are missing
This message shows that some datafiles are missing. As the datafiles belong to a tablespace other than SYSTEM, the database stays up with that tablespace being offline. This error is fairly critical, so the priority is set to HIGH. Each failure gets a Failure ID, which makes it easier to identify and address individual failures. For instance you can issue the following command to get the details of Failure 142.
RMAN> list failure 142 detail;
This command will show you the exact cause of the error. Now comes the fun part: How do you rectify the error? Seasoned DBAs will probably ace this without further help but novice DBAs (and even experienced but tired ones) will welcome some guidance here. They can turn to Data Recovery Advisor for assistance:
RMAN> advise failure;
It responds with a detailed explanation of the error and how to correct it:
List of Database Failures =========================
Failure ID Priority Status Time Detected Summary ---------- -------- --------- ------------- ------142 HIGH OPEN 15-JUL-07 One or more non-system datafiles are missing analyzing automatic repair options; this may take some time using channel ORA_DISK_1 analyzing automatic repair options complete Mandatory Manual Actions ======================== no manual actions available Optional Manual Actions ======================= 1. If file /home/oracle/oradata/PRODB3/users01.dbf was unintentionally renamed or moved, restore it Automated Repair Options ======================== Option Repair Description ------ -----------------1 Restore and recover datafile 4 Strategy: The repair includes complete media recovery with no data loss Repair script: /home/oracle/app/diag/rdbms/prodb3/PRODB3/hm/reco_3162589478.hm
This output has several important parts. First, the advisor analyzes the error. In this case, it's pretty obvious: the datafile is missing. Next, it suggests a strategy. In this case, this is fairly simple as well: restore and recover the file. (Please note that I have deliberately chosen a simple example to focus the attention on the usage of the tool, not to discuss the many cases where the database could fail and how they can be recovered. The dynamic performance view V$IR_MANUAL_CHECKLIST also shows this information.) However, the most useful task Data Recovery Advisor does is shown in the very last line: it generates a script that can be used to repair the datafile or resolve the issue. The script does all the work; you don't have to write a single line of code. Sometimes the advisor doesn't have all the information it needs. For instance, in this case, it does not know if someone moved the file to a different location or renamed it. In that case, it advises to move the file back to the original location and name (under Optional Manual Actions). OK, so the script is prepared for you. Are you ready to execute it? I don't know about you, but I would verify what the script actually does first. So, I issue the following command to "preview" the actions the repair task will execute:
RMAN> repair failure preview; Strategy: The repair includes complete media recovery with no data loss Repair script: /home/oracle/app/diag/rdbms/prodb3/PRODB3/hm/reco_741461097.hm contents of repair script: # restore and recover datafile sql 'alter database datafile 4 offline'; restore datafile 4; recover datafile 4; sql 'alter database datafile 4 online';
This is good; the repair seems to be doing the same thing I would have done myself using RMAN. Now I can execute the actual repair by issuing:
RMAN> repair failure;
Strategy: The repair includes complete media recovery with no data loss
Repair script: /home/oracle/app/diag/rdbms/prodb3/PRODB3/hm/reco_3162589478.hm contents of repair script: # restore and recover datafile sql 'alter database datafile 4 offline'; restore datafile 4; recover datafile 4; sql 'alter database datafile 4 online'; Do you really want to execute the above repair (enter YES or NO)?
Assuming I'm OK, I answer YES and the action goes on:
executing repair script sql statement: alter database datafile 4 offline Starting restore at 15-JUL-07 using channel ORA_DISK_1 channel ORA_DISK_1: restoring datafile 00004 input datafile copy RECID=5 STAMP=628025835 file name=/home/oracle/flasharea
/PRODB3/datafile/o1_mf_users_39ocxbv3_.dbf destination for restore of datafile 00004: /home/oracle/oradata/PRODB3/users01.dbf channel ORA_DISK_1: copied datafile copy of datafile 00004 output file name=/home/oracle/oradata/PRODB3/users01.dbf RECID=0 STAMP=0 Finished restore at 15-JUL-07 Starting recover at 15-JUL-07 using channel ORA_DISK_1 starting media recovery archived log for thread 1 with sequence 51 is already on disk as file /home/oracle/
and so on ... name=/home/oracle/flasharea/PRODB3/archivelog/2007_07_15/o1_mf_1_55_39o cy9ox_.arc thread=1 sequence=55 media recovery complete, elapsed time: 00:00:01 Finished recover at 15-JUL-07 sql statement: alter database datafile 4 online repair failure complete RMAN>
Note how RMAN prompts you before attempting to repair. In a scripting case, you may not want to do that; rather, you would want to just go ahead and repair it without an additional prompt. In such a case, just use repair failure noprompt at the RMAN prompt.
Proactive Health Checks
It helps you sleep better at night knowing that the database is healthy and has no bad blocks. But how can you ensure that? Bad blocks show themselves only when they are accessed so you want to identify them early and hopefully repair them using simple commands before the users get an error. The tool dbverify can do the job but it might be a little inconvenient to use because it requires writing a script file contaning all datafiles and a lot of parameters. The output also needs scanning and interpretation. In Oracle Database 11g, a new command in RMAN, VALIDATE DATABASE, makes this operation trivial by checking database blocks for physical corruption. If corruption is detected, it logs into the Automatic Diagnostic Repository. RMAN then produces an output that is partially shown below:
RMAN> validate database; Starting validate at 09-SEP-07 using target database control file instead of recovery catalog allocated channel: ORA_DISK_1 channel ORA_DISK_1: SID=110 device type=DISK channel ORA_DISK_1: starting validation of datafile channel ORA_DISK_1: specifying datafile(s) for validation input datafile file number=00002 name=/home/oracle/oradata/ODEL11/sysaux01.dbf input datafile file number=00001 name=/home/oracle/oradata/ODEL11/system01.dbf input datafile file number=00003 name=/home/oracle/oradata/ODEL11/undotbs01.dbf input datafile file number=00004 name=/home/oracle/oradata/ODEL11/users01.dbf channel ORA_DISK_1: validation complete, elapsed time: 00:02:18 List of Datafiles ================= File Status Marked Corrupt Empty Blocks Blocks Examined High SCN ---- ------ -------------- ------------ --------------- ---------1 OK 0 12852 94720 5420717 File Name: /home/oracle/oradata/ODEL11/system01.dbf Block Type Blocks Failing Blocks Processed ---------- -------------- ---------------Data 0 65435 Index 0 11898 Other 0 4535 File Status Marked Corrupt Empty Blocks Blocks Examined High SCN ---- ------ -------------- ------------ --------------- ---------2 OK 0 30753 115848 5420730 File Name: /home/oracle/oradata/ODEL11/sysaux01.dbf Block Type Blocks Failing Blocks Processed ---------- -------------- ---------------Data 0 28042 Index 0 26924 Other 0 30129 File Status Marked Corrupt Empty Blocks Blocks Examined High SCN ---- ------ -------------- ------------ --------------- ---------3 OK 0 5368 25600 5420730 File Name: /home/oracle/oradata/ODEL11/undotbs01.dbf Block Type Blocks Failing Blocks Processed ---------- -------------- ---------------Data 0 0
Index Other
0 0
0 20232
File Status Marked Corrupt Empty Blocks Blocks Examined High SCN ---- ------ -------------- ------------ --------------- ---------4 OK 0 2569 12256 4910970 ... <snipped> ...
Otherwise, in case of a failure you will see on parts of the above output:
List of Datafiles ================= File Status Marked Corrupt Empty Blocks Blocks Examined High SCN ---- ------ -------------- ------------ --------------- ---------7 FAILED 0 0 128 5556154 File Name: /home/oracle/oradata/ODEL11/test01.dbf Block Type Blocks Failing Blocks Processed ---------- -------------- ---------------Data 0 108 Index 0 0 Other 10 20
You can also validate a specific tablespace:
RMAN> validate tablespace users;
Or, datafile:
RMAN> validate datafile 1;
Or, even a block in a datafile:
RMAN> validate datafile 4 block 56;
The VALIDATE command extends much beyond datafiles however. You can validate spfile, controlfilecopy, recovery files, Flash Recovery Area, and so on.