Developing a backup and recovery strategy
A database can become unusable because of hardware or software failure, or both. You may, at one time or another, encounter storage problems, power interruptions, or application failures, and each failure scenario requires a different recovery action. Protect your data against the possibility of loss by having a well rehearsed recovery strategy in place. Some of the questions that you should answer when developing your recovery strategy are:
Will the database be recoverable? How much time can be spent recovering the database? How much time will pass between backup operations? How much storage space can be allocated for backup copies and archived logs? Will table space level backups be sufficient, or will full database backups be necessary? 7 Should I configure a standby system, either manually or through 7 high availability disaster recovery (HADR)?
A database recovery strategy should ensure that all information is available when it is required for database recovery. It should include a regular schedule for taking database backups and, in the case of partitioned database systems, include backups when the system is scaled (when database partition servers or nodes are added or dropped). Your overall strategy should also include procedures for recovering command scripts, applications, user-defined functions (UDFs), stored procedure code in operating system libraries, and load copies. Different recovery methods are discussed in the sections that follow, and you will discover which recovery method is best suited to your business environment. The concept of a database backup is the same as any other data backup: taking a copy of the data and then storing it on a different medium in case of failure or damage to the original. The simplest case of a backup involves shutting down the database to ensure that no further transactions occur, and then simply backing it up. You can then rebuild the database if it becomes damaged or corrupted in some way. The rebuilding of the database is called recovery. Version recovery is the restoration of a previous version of the database, using an image that was created during a backup operation. Rollforward recovery is the reapplication of transactions recorded in the database log files after a database or a table space backup image has been restored. Crash recovery is the automatic recovery of the database if a failure occurs before all of the changes that are part of one or more units of work (transactions) are completed and committed. This is done by rolling back incomplete transactions and completing committed transactions that were still in memory when the crash occurred. Recovery log files and the recovery history file are created automatically when a database is created (Figure 1). These log files are important if you need to recover data that is lost or damaged. Each database includes recovery logs, which are used to recover from application or system errors. In combination with the database backups, they are used to recover the consistency of the database right up to the point in time when the error occurred.
The recovery history file contains a summary of the backup information that can be used to determine recovery options, if all or part of the database must be recovered to a given point in time. It is used to track recovery-related events such as backup and restore operations, among others. This file is located in the database directory. The table space change history file, which is also located in the database directory, contains information that can be used to determine which log files are required for the recovery of a particular table space. 7 You cannot directly modify the recovery history file or the table 7 space change history file; however, you can delete entries from the files 7 using the the PRUNE HISTORY command. 7 You can also use the rec_his_retentn database configuration 7 parameter to specify the number of days that these history files will be 7 retained. Figure 1. Database recovery files
7 Data that is easily recreated can be stored in a non-recoverable database. 7 This includes data from an outside source that is used for read-only applications, and 7 tables that are not often updated, for which the small amount of logging does not justify 7 the added complexity of managing log files and rolling forward after a restore operation. 7 Non-recoverable databases have the 7 logarchmeth1 and logarchmeth2database configuration parameters set to "OFF". 7 This means that the only logs that are kept are those required for crash recovery. 7 These logs are known as active logs, 7 and they contain current transaction data. 7 Version recovery using offline backups is the primary means 7 of recovery for a non-recoverable database. 7 (An offline backup means that no other application can use the database when the backup 7 operation is in progress.) 7 Such a database can only be restored offline. 7 It is restored to the state it was in when the backup image was taken and 7 rollforward recovery is not supported. 7 Data that cannot be easily recreated 7 should be stored in a recoverable database. 7 This includes data whose source is destroyed after the data is loaded, 7 data that is manually entered into tables, and 7 data that is modified by application programs or users after 7 it is loaded into the database. 7 Recoverable databases have the 7 logarchmeth1 or logarchmeth2 database configuration parameters set to 7 a value other than "OFF". 7 Active logs are still available for crash recovery, 7 but you also have the archived logs, 7 which contain
committed transaction data. 7 Such a database can only be restored offline. 7 It is restored to the state it was in when the backup image was taken. 7 However, with rollforward recovery, you can roll the database forward 7 (that is, past the time when the backup image was taken) by using the active and archived logs 7 to either a specific point in time, or to the end of the active logs. 7 Recoverable database backup operations can be performed either offline 7 or online (online meaning that other applications 7 can connect to the database during the backup operation). 7 Online table space restore and rollforward operations are supported 7 only if the database is recoverable. 7 If the database is non-recoverable, database restore and rollforward operations 7 must be performed offline. 7 During an online backup operation, rollforward recovery ensures 7 that all table changes are captured and reapplied 7 if that backup is restored. If you have a recoverable database, you can back up, restore, and roll individual table spaces forward, rather than the entire database. When you back up a table space online, it is still available for use, and simultaneous updates are recorded in the logs. When you perform an online restore or rollforward operation on a table space, the table space itself is not available for use until the operation completes, but users are not prevented from accessing tables in other table spaces. 777 Automated backup operations 7 Since it can be time-consuming to determine whether and when to run 7 maintenance activities such as backup operations, you can use the 7 Configure Automatic Maintenance wizard to do this for you. 7 With automatic maintenance, you specify 7 your maintenance objectives, including when automatic maintenance can 7 run. DB2 then uses these objectives to determine if the maintenance 7 activities need to be done and then runs only the required 7 maintenance activities during the next available maintenance window (a 7 user-defined time period for the running of automatic maintenance 7 activities).
Best Practices: Backup and Recovery Strategies
You can't recover data that you haven't kept. But how confident are you that the data on which your business depends is backed up successfully? This paper examines the kinds of data storage technologies and solutions that are best for all businesses and offers some best practices for ensuring the successful data backup and recovery required to sustain operations -- regardless of what happens to your business.
It's always a challenge to keep your business data readily available when you need it. And this job gets even tougher the smaller your technical staff -- assuming that you have a staff at all.
Managing and Minimizing Storage Requirements
To meet your growing data storage demands without suffering through repeated upgrade hassles, it's worthwhile to stop and carefully review your:
Current and anticipated storage requirements. Data protection and backup needs, since backing up multiple servers, volumes, and desktop systems can quickly get very complex.
Cost-effective Data Storage Solutions
The cost of data storage has plummeted in recent years. Meanwhile, the labor costs associated with managing data storage keep climbing. So consider buying more storage -- even more than you think you'll need. It's worth it in the long run and the extra cost is negligible. To keep storage management costs in line, think hard about how you'll configure your data storage. For instance:
It's probably not worth upgrading the storage built into your existing servers since the cost of backing up the data they contain, adding new hard drives, then restoring the data will be substantial. Instead, make sure that the data storage you do buy is scalable, so it's easy to add more later when you require it. Buy the kind of data storage devices best suited to the services they support. For instance, IDE drives usually work just fine for file and basic application services, as do SATA controllers and drives. While IDE and SATA devices can't match the performance of SCSI drives (which offers fast transfer rates and rotation speeds), they cost much less than SCSI solutions. For applications requiring high-performance and reliable availability of data, look to SCSI RAID solutions, which cost more but deliver fault tolerance with redundant configurations (how much availability and fault-tolerance depends on what sort of RAID level you select). If your data is stored on multiple servers, you can consolidate it onto fewer servers with larger hard drives so management and backup is easier. If you need to add storage to your company network, consider NAS devices, which are simpler than file servers since they use web-based administrator interfaces to mask operational complexities. Pay close attention to NAS device details to ensure that your chosen solution works with your existing systems and networks while remaining scalable. If you need high-capacity, undisrupted data access, redundant system links to ensure data integrity, an ability to reconfigure and/or scale your storage infrastructure, and centralized storage management and backup capability, consider a SAN. iSCSI-based SANs, which are
based on IP-friendly Ethernet network technology, are less expensive and complex than Fibre Channel SANs.
Tips for Taking the Backache Out of Backup
As your business increasingly relies on data, backing up that data becomes even more important but also more complex. The effort can be grueling (and costly) if you do not have a backup strategy that is based on proper planning and is faithfully executed according to a carefully crafted backup policy. Here's how to begin:
Decide what you need to back up. Start with your answer to "What can we afford to lose?" Understand your data environment. Once you know what requires backing up, you'll need to determine the systems and hosts where it's located; what type of data it is; how often it needs to be backed up and how often it's likely to be retrieved/restored; how long it must be retained and in what form; how much time you have to complete the backup; and what kind of security the data requires. By ranking the importance of your data and eliminating the unnecessary data from your backup efforts, you can save storage space. Find the backup techniques and technologies that best align with your business needs and that automate as much of your backup efforts as possible. For instance, it may be worthwhile to consolidate data on fewer servers to reduce backup management efforts. You may benefit from using backup/recovery solutions that are bundled with a storage appliance. Or perhaps you should opt to outsource backups entirely. Consult with an expert if you don't understand this process. Craft the processes and procedures you'll need to ensure backups are completed properly, including assigning responsibility for getting backups accomplished and monitoring the effort to spot problems, while also ensuring that those responsible are sufficiently trained. Ensure that backup copies are valid and can be successfully restored, which requires that you rank the importance of your data and establish ways that the most important data is backed up first and restored first. Be sure that you have adequate time to back up all the data that's important to your business, and be sure to understand the time required to restore that data in case of loss or corruption. You'll also need to regularly check and test your equipment, media, and processes. Ensure that backup copies are safe. Generally, this means storing your backups in a logically and physically secured offsite location. It also means ensuring that you haven't backed up viruses and other malware, spam, and data that is not important or that is harmful to your business. Maintain backup logs so you -- and your auditors -- can track backup activities. Regularly revisit your backup/restore risks, procedures, and technologies to make sure they are adequate as business needs and conditions evolve. Dispose of backup media carefully, making sure that they are physically destroyed so that their contents cannot be read by the unauthorized.
Of course, the backup technologies you use depend greatly on the size and nature of your business and how it uses information. Below are some newer technologies that may be able to help ease your backup burdens.
Best Practices for Business Continuity Protection
The ability to sustain business operations in the face of disaster -- or merely a hardware or network failure or employee error -- requires planning. You can figure out if the effort is worthwhile by asking and answering one simple (yet scary) question: "How long would my business survive without its computer systems, networks, and applications; without its business data; without its phone system; and without its offices?" If you conclude that it's wise to think through how your business should respond to events that interrupt its operations, you can begin with the guidelines contained in the DRBC (disaster recovery/business continuity) Framework, developed by Naresh Malhotra and Saby Mitra of the DuPree College of Management at the Georgia Institute of Technology:
Charter a team. This involves getting commitment from the CEO of your company and establishing a cross-functional steering committee and a core operational team. Conduct an analysis of your business. You'll need to identify the goals of your business as well as its outputs, processes and resources, the risks it faces, the potential impacts of those risks, and the roles of those (such as technology vendors) you'll turn to for risk mitigation. Define a disaster recovery/business continuity strategy. This must be done at the company-wide level as well as for your business processes and resources; then you'll need to figure out how to pay for it. Develop a detailed plan. Define its scope, document requirements in detail, then design it. Implement your plan. Steps include getting buy-in throughout your company, developing implementation documentation, assigning roles and responsibilities, training employees, and testing what you've implemented. Maintain your plan. You'll need a change management process as well as the ability to monitor performance and benchmark new applications, products, and processes.
This effort may not have to be as complicated as it sounds. Businesses often can, for instance, get help setting priorities at facilitated workshops that conduct risk assessment and business impact analyses. If your business has multiple locations, one site can serve as backup for another. In addition, you can upgrade your IT systems maintenance contracts to get replacement hardware in 24 to 48 hours, which can be drop-shipped to a recovery location where data and applications can be loaded from backup stores. The key is planning, training, testing, and regular review of the plan. Do this and you'll have the same chances of surviving any trouble that you might encounter regarding your business operations.
Backup and Recovery Strategies
This chapter offers guidelines and considerations for developing an effective backup and recovery strategy. This section includes the following topics:
Data Recovery Strategy Determines Backup Strategy Planning Data Recovery Strategy Planning Backup Strategy Validating Your Data Recovery Strategy
Data Recovery Strategy Determines Backup Strategy
To decide on backup strategies, start with your data recovery requirements and your data recovery strategy. Each type of data recovery will require that you take certain types of backup. Failures can run the gamut from user error, datafile block corruption and media failure to situations like the complete loss of a data center. How quickly you can resume normal operation of your database is a function of what kinds of restore and recovery techniques you include in your planning. Each restore and recovery technique will impose requirements on your backup strategy, including which features of the Oracle database you use to take, store and manage your backups. When thinking about recovery strategies, ask yourself questions like these:
If a disk failed and destroyed some of the database files, such as datafiles or redo logs, how would you recover the lost files? As described in "Planning a Response to Media Failure: Restore and Media Recovery", you should be able to handle the loss of datafiles, control files, and online redo logs. If a logic error in an application or a user error caused the loss of important data from one or several tables or tablespaces, how could you recover that data, and what would happen to database updates since the error? Could you determine the cause of the error, to prevent it from happening again? As described in "Planning a Response to User Error: Point-in-Time Recovery and Flashback Features", techniques available to you include point-in-time recovery of the whole database or one or more tablespaces, importing data from earlier logical exports with one of the data import utilities, and using the Oracle database's flashback features. If the instance alert log indicates that one or more tables contains corrupt blocks, how can you repair the corruption? Does the tablespace have to remain available during the repair? As described in "Planning a Response to Datafile Block Corruption: Block Media Recovery", the RMAN BLOCKRECOVER command can help you in this situation. Also, troubleshoot recovery with the SQL*Plus RECOVER ... TEST command. If the entire data center is destroyed, can you perform disaster recovery? Assume that all you have is an archive tape containing backups. How would you recover the database? How long would that recovery take? If you were not available to recover your database, could someone else recover it in your absence? Are your recovery procedures sufficiently automated and documented?
With these needs in mind, decide how you can take advantage of features related to backup and recovery, and look at how each feature meets some requirement of your backup strategy. For example:
Using Recovery Manager simplifies most backup and recovery operations compared to usermanaged backup and recovery. It automates management of most backup files, including the deletion of backups and archived redo logs from disk or tape when no longer needed to meet recovery goals. It provides detailed reporting on backup activities, can verify that your available backups can be used to recover your database. Finally, RMAN makes possible many recovery techniques not available if you are using user-managed backup and recovery, such as incremental backups. Flashback Database will help you restore a database to a previous time much faster than media recovery. However, you must decide in advance to keep flashback logs, and keeping flashback logs requires that you configure a flash recovery area. Block media recovery may be better than datafile media recovery if availability is critical. While block media recovery is possible even if you do not base your backup and recovery strategy on RMAN, RMAN-based block media recovery can be performed more quickly and with less effort.
Once you decide which features to use in your recovery strategy, you can plan your backup strategy, answering the following questions, among others:
How and where will you store your recovery-related files? Will you use a flash recovery area? Will you use an ASM disk group? Will you store backups on tape or other offline storage, or only on disk? At what intervals will you take scheduled backups? And what form of physical backups will you take in each situation? What situations require you to take a database backup outside of the regular schedule? Sometimes you must take an unscheduled backup to ensure that you can recover your data, such as after an OPEN RESETLOGS or after changes to your database such as NOLOGGING operations that do not appear in the redo log. You may also have business requirements that require backups for auditing purposes or other reasons not related to database recovery. How can you validate your backups, to ensure that you can recover your database when necessary? How do you manage records of your backups? Do you have detailed recovery plans that cover each type of failure? How do your DBAs can execute these plans in a crisis? Can scripts be written to automate execution of thes e plans in a crisis? Can you apply Oracle database availability technologies, such as Data Guard or Real Application Clusters, to improve availability during a database failure? How does using these availability technologies affect your backup and recovery strategy?
These are of course only a few of the considerations you should take into account. Available resources (hardware, media, staff, budget, and so on) will also be factors in your decision.
Planning Data Recovery Strategy
Your data recovery strategy should include responses to any number of database failure scenarios. The key to an effective, efficient strategy is envisioning failure modes, matching Oracle database recovery techniques and tools to the failure modes in which they are useful,
and then making sure you incorporate the necessary backup types to support those recovery techniques. To help match failure modes to recovery techniques that can help resolve them, refer to the following sections:
Planning a Response to User Error: Point-in-Time Recovery and Flashback Features Planning a Response to Media Failure: Restore and Media Recovery Planning a Response to Datafile Block Corruption: Block Media Recovery
Planning a Response to User Error: Point-in-Time Recovery and Flashback Features
Your backup and recovery strategy should enable you to handle situations in which a user or application makes unwanted changes to database data, such as deleting the contents of a table or making incorrect updates during a batch run. The goal in such a case will be to restore the affected parts of your database to their state before the user error. Depending on the situation, your appropriate response will be one of the following:
If you have performed a logical backup by exporting the contents of the affected tables, sometimes you can import the data back into the table. This technique presumes that you are regularly exporting logical backups of your data, and that any changes between exports are unimportant. You can perform point-in-time recovery, bringing one tablespace or the whole database back to its state before the time of the error. In either case, you need backups from before the time of the error, plus the redo logs from the time of the backup to the time of the error.
Oracle's Flashback Technology provides faster and less disruptive alternatives to media recovery in many circumstances.
Oracle Flashback Database is a physical-level recovery mechanism similar to media recovery, but generally faster and not requiring the restore of data from backup. Oracle Flashback Table and Oracle Flashback Drop work at the logical level, undoing unwanted changes to tables, including reversing the effects of DROP TABLE statements. Oracle Flashback Query and Oracle Flashback Version Query are useful in viewing past contents of tables and investigating how and when logical corruptions affected your database.
Information about these features is collected in Oracle Database Backup and Recovery Advanced User's Guide. This document will allude to such features where they can be helpful and provide pointers for more information. Familiarize yourself with these features before planning your backup and recovery strategy, because you may find that they can be quite
valuable and require limited advanced planning.
Planning a Response to Media Failure: Restore and Media Recovery
A media failure occurs when a problem external to the database prevents Oracle from reading from or writing to a file during database operations. Typical media failures include physical failures, such as head crashes, and the overwriting, deletion or corruption of a database file. Media failures are less common than user or application errors, but your backup and recovery strategy should prepare for them. The type of media failure determines the recovery technique to use. For example, the strategy you use to recover from a corrupted datafile is different from the strategy for recovering from the loss of the control file. Example: Online Redo Log Recovery The method of recovery from loss of all members of an online log group depends on a number of factors, such as:
The state of the database (open, crashed, closed consistently, and so on) Whether the lost redo log group was current Whether the lost redo log group was archived
If you lose the current group, and the database is not closed consistently (either it is open, or it has crashed), then you will have to restore an old backup and perform point-in-time recovery, followed by OPEN RESETLOGS. You will lose all transactions that were in the lost log. You should take a new full database backup immediately after the OPEN RESETLOGS . Backups from before the OPEN RESETLOGS will not be recoverable because of the lost log. If you lose the current redo log group, and if the database is closed consistently, then you can perform OPEN RESETLOGS with no transaction loss. However, you should take a new full database backup. Backups from before the OPEN RESETLOGS will not be recoverable because of the lost log. If you lose a noncurrent redo log group, then you can use the ALTER DATABASE CLEAR LOGFILE statement to re-create all members in the group. No transactions are lost. If the lost redo log group was archived before it was lost, then nothing further is required. Otherwise, you should immediately take a new full backup of your database. Backups from before the log was lost will not be recoverable because of the lost log.
Planning a Response to Datafile Block Corruption: Block Media Recovery
If a small number of blocks within one or more datafiles are corrupt, you can perform block media recovery instead of restoring the datafiles from backup and performing complete media recovery of those files. The Recovery Manager BLOCKRECOVER command can be used
to restore and recover specified data blocks while the database is open and the corrupted datafile is online.
Oracle Database Backup and Recovery Advanced User's Guide to learn how to perform block media recovery with RMAN.
Planning Backup Strategy
Your plans for data recovery strategies are the basis of your plans for backup strategy. This discussion describes general guidelines that can help you decide when to perform database backups, which parts of a database you should back up, what tools Oracle provides for those backups, and how to configure your database to improve its robustness and make backup and recovery easier. Of course, the specifics of your strategy must balance the needs of your restore strategy with questions of cost, resources, personnel and other factors.
Protecting Your Redundancy Set
The set of files needed to recover an Oracle database from the failure of any of its files--a datafile, control file, or online redo log--is called the redundancy set. The redundancy set should contain:
The last backup of the control file and all the datafiles All archived redo logs generated after the last backup was taken Duplicates of the online redo log files, generated by Oracle database multiplexing, operating system mirroring, or both Duplicates of the current control file, generated by Oracle database multiplexing, operating system mirroring, or both Copies of configuration files such as the server parameter file, tnsnames.ora, and
The first rule of protecting your redundancy set is: The set of disks or other media that contain the redundancy set for your database should be separate from the disks that contain the datafiles, online redo logs, and control files. This practice ensures that the failure of a disk that contains a datafile does not also cause the loss of the backups or redo logs needed to recover the datafile. Consequently, a minimal production-level database requires at least two disk drives: one to hold the files in the redundancy set and one to hold the database files. Ideally, separate the redundancy set from the primary files in every way possible: on separate volumes, separate file systems, and separate RAID devices. Keeping the redundancy set separate from the primary files ensures that you will not lose committed transactions in a disk failure. The simplest way to manage your redundancy set is to use a flash recovery area, on a separate device from the working set files. All recovery-related files will be stored in a single location on disk, disk space usage is managed automatically, backups required to meet your data recovery requirements are never deleted from disk while they are still needed, and recovery time is minimized without compromising the completeness of the redundancy set.
Whether or not you use a flash recovery area, Oracle Corporation recommends following these guidelines:
Multiplex the online redo log files and current control fi le at the database level. (For instance, configure the database to write its online logs to two or more destinations, so that each write is a separate operation carried out by the database, rather than by operating system-level or hardware-level redundancy.) If you multiplex at the database level, then an I/O failure or lost write should only corrupt one of the copies.
Ideally, the multiplexed files should be on different disks mounted under different disk controllers. The flash recovery area is an excellent location for one copy of these files. You can also mirror the online redo logs and current control file at the operating system or hardware level, but this is not a substitute for multiplexing at the database level.
If running in ARCHIVELOG mode, archive the redo logs to multiple locations, ideally on different disks. If you are using a flash recovery area, use it as one of the archiving locations. Use operating system or hardware mirroring for the control file. All copies of the control file multiplexed at the database level must be available at all times, or the instance will crash. If you use operating system or hardware mirroring for your control file, your database can continue to operate even if one copy of the control file mirrored at the operating system level is unavailable due to a disk failure. Use operating system or hardware mirroring for the primary datafiles if possible, to avoid having to perform media recovery for simple disk failures. Keep at least one copy of the entire redundancy set--including the most recent backup--on disk. The flash recovery area is the ideal location for the redundancy set. If the target database is stored on a RAID device, then store the redundancy set on a set of disks that are not in the same RAID device. If you store the redundancy set on tape, then maintain at least two copies of the data to protect against the risk of tape failure. Also, if you have more than one copy of the same data, then consider keeping backups from different points in time. In this way, i f one backup or split mirror was done when the database was corrupted, then you have an older backup when the database was not corrupted.
Deciding Between ARCHIVELOG and NOARCHIVELOG Mode
The redo logs of your database provide a complete record of changes to the datafiles of your database (with a few exceptions, such as direct path loads). You can run your database in one of two modes: ARCHIVELOG mode or NOARCHIVELOG mode. In ARCHIVELOG mode, a used online redo log group must be copied to one or more archive destinations before it can be reused. Archiving the redo log preserves all transactions stored in that log, so that they can be used in recovery operations later. In NOARCHIVELOG mode, the online redo log groups are simply overwritten when the log is re-used. All information about transactions recorded in that redo log group is lost. Implications of Running in NOARCHIVELOG Mode
Running your database in NOARCHIVELOG mode imposes severe limitations on your backup and recovery strategy.
You cannot perform online backups of your database. You must shut your database down cleanly before you can take a backup in NOARCHIVELOG mode. You cannot use any data recovery techniques that require the archived redo logs. These include complete and point-in-time media recovery, as described in "Forms of Data Recovery", and more advanced recovery techniques such as point-in-time recovery of individual tablespaces and Flashback Database (described in Oracle Database Backup and Recovery Advanced User's Guide.).
If you are running in NOARCHIVELOG mode and you must recover from damage to datafiles due to disk failure, you have two main options for recovery:
Drop all objects that have any extents located in the affected files, and then drop the files. The remainder of the database is intact, but all data in the affected files is lost. Restore the entire database from the most recent backup, and lose all changes to the database since the backup. (Recovering changes since the backup would require performing media recovery, which uses the archived redo logs.)
Implications of Running in ARCHIVELOG Mode For most applications, running in ARCHIVELOG mode is preferable to running in NOARCHIVELOG mode because you have more flexible recovery options after a data loss. There are, however, associated costs of running in ARCHIVELOG mode:
Space must be set aside for archiving destinations, locations on disk where the archived redo logs will be stored. These can become quite large in databases with large numbers of updates. The stored archived redo logs must be managed. To limit the disk space used by archived redo logs, archived redo logs can be moved to tape for longer-term storage, and older logs no longer needed to meet your recoverability goals should be deleted. (RMAN can automate most of the management of archived redo logs, by recording the location and contents of all archived redo logs, making it easy to move archived logs to tape, and identifying and deleting redo logs no longer required to meet your recoverability objectives.) Some performance overhead is associated with the background processes ARC0 through ARCn which copy filled online redo logs to the archiving destinations.
When performance requirements are extreme or disk space limitations are severe, it may be preferable to run in NOARCHIVELOG mode in spite of the restrictions imposed.
Deciding Whether to Use a Flash Recovery Area
It is recommended that you take advantage of the flash recovery area to store as many backup and recovery-related fileas as possible, including disk backups and archived redo logs. Some features of Oracle database backup and recovery, such as Oracle Flashback Database, require the use of a flash recovery area. In such cases, you must create a flash recovery area, though you do not have to use it to store all recovery-related files.
Even when its use is not required, however, the flash recovery area offers a number of advantages over other on-disk backup storage methods. Backups moved to tape from the flash recovery area are retained on disk until space is needed for other required files, reducing the need to restore backups from tape. At the same time, obsolete files no longer needed to meet your recoverability goals and files backed up to tape become eligible for deletion and are deleted when space is needed, eliminating the need for DBA intervention to clear out old files.
"Setting Up a Flash Recovery Area for RMAN" for more about the uses and benefits of the flash recovery area.
Choosing a Backup Retention Policy
Your backup retention policy is the rule you set regarding which backups must be retained (whether on disk or other backup media) to meet your recovery and other requirements. Backup retention policy can be based on redundancy or a recovery window. In a redundancy-based retention policy, you specify a number n such that you always keep at least n distinct backups of each file in your database. In a recovery window-based retention policy, you specify a time interval in the past (for example, one week, or one month) and keep all backups required to let you perform point-in-time recovery to any point during that window. A backup no longer needed to satisfy the backup retention policy is said to be obsolete. Implementing Backup Retention Policy with RMAN RMAN automates the implementation of a backup retention policy, using the following commands:
CONFIGURE RETENTION POLICY command lets you set the retention policy that will apply
to all of your database files by default. REPORT OBSOLETE command lets you list backups currently on disk that are obsolete under the retention policy. You can also specify parameters to see which files would be obsolete under different retention policies. DELETE OBSOLETE command deletes the files which REPORT OBSOLETE would list as obsolete. CHANGE... KEEP lets you set a separate retention policy for specific backups, such as longterm backups kept for archival purposes. You can specify that a given backup must be kept until a future time, or even specify that a backup be kept forever. CHANGE... NOKEEP is used to let the retention policy apply to a backup previously protected by CHANGE... KEEP.
If you use a flash recovery area to store your backups, the database will delete obsolete backups automatically as disk space is needed for newer backups, archived logs and other files. For backups stored on disk outside a flash recovery area and for backups stored on tape, you should periodically run the DELETE OBSOLETE command to remove obsolete backups. Recovery Window-Based Backup Retention Policy
A recovery window-based retention policy lets you guarantee that you can perform point-intime recovery to any point in the past, up to a number of days that you specify. The earliest point in time to which you can recover your database under your retention policy is known as the point of recoverability. All backups required for recovery or point-in-time recovery back to that time will be retained. Note that this will generally require that you keep backups older than the beginning of the recovery window. A point-in-time recovery to the beginning of the recovery window would require a restore from this backup, and then applying all changes between the backup time and the point of recoverability. For example, you might configure a recovery window of three days:
RMAN> CONFIGURE RETENTION POLICY TO RECOVERY WINDOW OF 3 DAYS;
If your last full database backup was six days ago, RMAN will keep the six-day-old backup, and all redo logs required to roll the database forward to the beginning of the recovery window three days ago, in addition to any backups and redo logs needed to recover the database to all points in time within the three day window. A recovery window-based backup retention policy provides the most certain recoverability for your data. The disadvantage is that more careful disk space planning is required, since it may not be obvious how many backups of datafiles and archived logs must be retained to guarantee the recovery window. Redundancy-Based Backup Retention Policy A redundancy-based backup retention policy determines whether a backup is obsolete based on how many backups of a file are currently on disk. You might configure a redundancy level of 3:
RMAN> CONFIGURE RETENTION POLICY TO REDUNDANCY 3;
In this case, RMAN keeps three backups of each database file, and all redo logs required to recover all retained datafile backups to the current time. Any older backups will be considered obsolete. Assume, for instance, that you make backups of a datafile every day, starting on a Monday. On Thursday, you make your fourth backup of the datafile, and the backup from Monday becomes obsolete because you have the backups from Tuesday, Wednesday and Thursday. On Friday, the backup from Tuesday becomes obsolete, because you have the backups from Wednesday, Thursday and Friday.
Archiving Older Backups
There are several reasons to keep older backups of datafiles and archived logs:
An older backup of datafiles and archived logs is necessary for performing point-in-time recovery to a time before your most recent backup.
If your most recent backup is corrupt, you can still recover your database using an older backup and the complete set of archived logs since that older backup. You may want to keep a copy of the database for archival purposes.
To perform point-in-time recovery to a given target time earlier than your current point of recoverability, then you need a database backup that completed before the target time, as well as all of the archived logs created between the time the backup was started and the target time. For example, if you take full database backups starting at 1:00 AM on February 1 (at SCN 10000) and on February 14 (at SCN 20000), and if you decide on February 28 to use point-in-time recovery to bring your database to its state at 9:00AM February 7 (SCN 13500), then you must use the February 1 backup, plus all redo logs containing changes from between the beginning of the creation of the backup (SCN 10000) and 9:00AM February 7 (SCN 13500). Note that point-in-time recovery to a time between backups is not an option for a database operating in NOARCHIVELOG mode. You can only restore your entire database from a consistent whole database backup, and re-open the database as of the time of that backup. You will lose all changes since the backup was taken.
Determining Backup Frequency
Frequent backups are essential for any recovery scheme. Base the frequency of backups on the rate or frequency of database changes such as:
Addition and deletion of tables Insertions and deletions of rows in existing tables Updates to data within tables
The more frequently your database is updated, the more often you should perform database backups. The scenario in "Backup Scripts When Blocks Change Frequently" backs up the database every week. If database updates are relatively infrequent, then you can make whole database backups infrequently and supplement them with incremental backups (which will be relatively small because few blocks have changed). The scenario in "Backup Scripts When Few Data Blocks Change" describes how to develop a backup strategy based on a single whole database backup.
"Backing Up to the Flash Recovery Area: Basic Scenarios" "Backing Up to the Flash Recovery Area and to Tape: Basic Scenarios"
Performing Backups Before and After You Make Structural Changes
There are times when you will need to take a backup of your database independent of your regular backup schedule. If you make any of the following structural changes, then perform a backup of the appropriate portion of your database immediately before and after completing the following changes:
Create or drop a tablespace. Add or rename a datafile in an existing tablespace. Add, rename, or drop an online redo log group or member.
The part of the database that you should back up depends on your archiving mode.
Make a control file backup (using RMAN or using the SQL ALTER DATABASE statement with the BACKUP CONTROLFILE option) after a structural alteration. Of course, you can back up other parts of the database as well. Make a consistent whole database backup immediately after the modification.
Backing Up Frequently Used Tablespaces
If you run in ARCHIVELOG mode, then you can back up an individual tablespace or even a single datafile. You might want to do this for one or more tablespaces that are updated much more often than the rest of your database, as is sometimes the case for the SYSTEM tablespace and automatic undo tablespaces. More frequent backups of heavily-used datafiles can shorten recovery times in some situations. You may have a database where most updates are restricted to a small set of tablespaces. If you take a full database backup each Sunday, then recovery from a media failure affecting the frequently updated tablespaces on Friday requires re-applying large amounts of redo. Daily backups of the frequently-updated tablespaces reduces the amount of redo to apply without requiring a daily full database backup.
Oracle Database Administrator's Guide for information about managing undo tablespaces
Backing Up after NOLOGGING Operations
When a direct path load is performed to populate a database, no redo data is logged for those database changes. You cannot recover these changes after a restore from backup using conventional media recovery. Likewise, when tables and indexes are created as NOLOGGING, the database does not log redo data for these objects, which means that you cannot recover these objects from existing backups. Therefore, you should back up your datafiles after operations for which no redo data is logged.
You can use either a full backup of your datafiles or an incremental backup. Either one will capture all changed blocks, including blocks
changed by unrecoverable operations.
Oracle Database SQL Reference for information about the UNRECOVERABLE option of the CREATE TABLE ... AS SELECT and CREATE INDEX statements.
Exporting Data for Added Protection and Flexibility
Oracle database import and export utilities are used to export database objects (tables, stored procedures, and so forth) from databases to be stored as files, and re-import objects from those files. An export provides a logical-level snapshot of the exported objects at the time of the export, as a binary file that can be imported back into the source database or some other database. Consider exporting portions or all of a database for supplemental protection and flexibility in a database's backup strategy. While useful, database exports are not a substitute for whole database backups. They cannot provide the same complete recovery advantages of physical- level backups. For example, you cannot apply archived logs to logical backups in order to update lost changes.
Oracle Database Utilities for more details about exporting and importing data for logical backup
Preventing the Backup of Online Redo Logs
Online redo logs, unlike archived logs, should never be backed up. The chief danger associated by having backups of online redo logs is that you may accidentally restore those backups without meaning to, and corrupt your database. Online redo log backups are also not particularly useful, for the following reasons:
If your database is in ARCHIVELOG mode, then the archiver is already archiving the filled redo logs automatically. If your database is in NOARCHIVELOG mode, then the only type of physical backups that you can perform are closed, consistent, whole database backups. The fi les in this type of backup are all consistent and do not need recovery, so the online logs are not useful after a restore from backup.
The best method for protecting the online logs against media failure is to multiplex them, with multiple log members in each group, on different disks attached to different disk controllers.
RMAN does not permit you to back up online redo logs. You must archive a redo log before backing it up.
Keeping Records of the Hardware and Software Configuration of the Server
During the stress of a recovery situation, it is important that you have all necessary information at your disposal. This is especially true if for some reason you need to contact Oracle Support because you run into a problem that you do not understand. You should have the following documentation about the hardware configuration:
The name, make, and model of the machine that hosts the database The version and patch of the operating system The number of disks and disk controllers The disk capacity and free space The names of all datafiles The name and version of the media management software (if you use a third-party media manager)
You should also keep the following documentation about the software configuration:
The name of the database instance (SID) The database identifier (DBID) The version and patch release of the Oracle database server The version and patch release of the networking software The method (RMAN or user-managed) and frequency of database backups The method of restore and recovery (RMAN or user-managed)
You should keep this information both in electronic and hardcopy form. For example, if you save this information in a text file on the network or in an email message, then if the entire system goes down, you may not have access to this data. It is especially important to keep a record of the DBID. If you have to restore and recover your database including the loss of the SPFILE and control file, you will need the DBID during the recovery process. See "Basic Database Restore and Recovery Scenarios" for details on how the DBID is used during recovery.
Validating Your Data Recovery Strategy
Practice backup and recovery techniques in a test environment before and after you move to a production system. In this way, you can measure the thoroughness of your strategies and minimize problems before they occur in a real situation. Performing test recoveries regularly ensures that your archiving, backup, and recovery procedures work. It also helps you stay familiar with recovery procedures, so that you are less likely to make a mistake in a crisis.
If you use RMAN, then run the DUPLICATE command to create a test database using backups of your production database. If you perform user-managed backup and recovery, then you can either create a new database, a standby database, or a copy of an existing database by using a combination of operating system and SQL*Plus commands.
Oracle Database Backup and Recovery Advanced User's Guide to learn about RMAN testing methods, troubleshooting SQL*Plus recovery, block media recovery, and RMAN disaster recovery
Validating RMAN Backups: BACKUP VALIDATE and RESTORE VALIDATE
The RMAN BACKUP VALIDATE and RESTORE VALIDATE commands can be a useful part of your recovery plan testing. BACKUP VALIDATE reads all of the specified files but does not produce any output files. All of the data blocks in the input files are validated, exactly as they are when a real backup takes place. RESTORE VALIDATE reads all of the backup files that would be needed to restore the specified objects, but the objects are not actually restored to disk. All of the data blocks in the backup files are validated, exactly as they are when a real restore takes place. Just as in a real restore, RESTORE VALIDATE automatically chooses which backup files to restore from. For example, the command RESTORE VALIDATE DATABASE ensures that, for every file in the database, a valid backup exists, can be read, and contains valid data.
Planning for backup and recovery: the key elements for a successful backup and recovery strategy
Backup and recovery are point-source solutions on the continuum of availability and resiliency. Availability describes operational behavior of a system under adverse conditions (i.e., failure of one-half of a clustered system). Resiliency describes the operational behavior of system restoration after service degradation due to an unplanned event. In their most extreme examples, both availability and resiliency are provided by backup and recovery systems. Backup and recovery systems traditionally address two specific requirements: error correction (i.e., inadvertent deletion of a necessary file or files) and disaster recovery. Many firms developed and currently operate backup and recovery strategies primarily designed to address the operational issue of error recovery. The backup media generated by these error recovery strategies are subsequently taken off site and extended to meet disaster recovery requirements mandated by regulatory bodies or to support limited recovery for only a few weeks or long enough to close the business. Most of these strategies focused on the risks associated with a site- or institution-specific outage or disaster. Re-Evaluating B&R Strategies Enterprises of all sizes, from small businesses to large financial services firms, are reevaluating backup and recovery strategies. This is driven by several, highly correlated factors. The terrorist attacks of September 11, 2001, and the North East power grid failure of August 14, 2003 demonstrated, firstly, that regional disasters must be considered as likely as localized institutional disasters--which calls into question the validity of over-subscription in the first-come, first-served shared services disaster recovery market, such as IBM, EDS, and SunGard. Secondly, these events demonstrated that in the age of e-business, same-day settlement, and real-time Internet transactions, computer systems are no longer business facilitation tools, but rather fundamental components of the ability to perform business at all. This calls into question the pervasive belief that long-term recovery strategies can rely on the ability to close the business through access to sufficient books and records. These factors are evident in the Interagency White Paper on Sound Practices to Strengthen the Resilience of the U.S. Financial System, which addresses business resiliency in the financial services sector, and was co-authored by the Federal Reserve Board, the Office of the Comptroller of the Currency, and the Securities and Exchange Commission. Internal providers of data center services are increasingly challenged to address a diverse set of changing requirements and subjective interpretations. The key to success in this endeavor, however, is to ensure that the solutions are well mapped to the business requirements. Inasmuch as business requirements are potentially subject to significant change, as indicated above, the first step in the architecture of a functional availability and resiliency model is to perform a Business Impact Analysis or BIA. Business Impact Analysis The BIA provides identification of critical business processes required to operate the business. These business processes are then mapped to the underlying applications, which, in turn, allow identification of systems, databases, and files required to operate the business.
Within the context of the BIA, all business processes and applications seldom share a common level of criticality, requiring the imposition of tiers of availability; for example, some systems must be operational within 30 minutes of an event, and other systems must be operational within two business days. An important factor for the enterprise to consider during the BIA is the dollar value of the business to be recovered. Backup and recovery systems mitigate risk of loss, but only at the cost of implementation. A well-understood cost model of the business to be protected is critical for valid interpretation of the costs to deploy a robust recovery system. RTO, RPO and MPO The definition of these tiers of availability and resiliency provide the foundation for a functional architecture. The key data points that differentiate the tiers of availability are the RTO, RPO, and MPO. The RTO is the maximum time between an event and the time at which a system must be returned to operation, or Recovery Time Objective. The RPO is the maximum allowable data loss, or Recovery Point Objective. The MPO, or Maintenance Point Objective, is an additional but less commonly articulated metric that describes the maximum allowable window for the performance of system maintenance. The RTO, RPO and MPO are critical data points to consider during architecture of an availability and resiliency model. As previously discussed, tape-based backup and recovery systems can be considered as point-source solutions on this continuum. Tape-based backup systems, however, do not easily accommodate a RTO of less then four hours, due to the time required to physically retrieve the data from tape, nor do they easily accommodate a RPO requirement of less than 24 hours for similar reasons. A multi-terabyte system, however, with a low RTO (less than 10 minutes) and a large RPO (greater than 72 hours) may be accommodated by tape as the restoration process may take place at preemptive set intervals to prepare for rapid recovery, but such a scenario may fail due to the generation of the backup data sets exceeding the allowable MPO. The RTO, RPO, MPO, and identification of necessary data and compute elements within the enterprise are the primary result of the BIA and provide the business requirements for a backup and recovery solution. Backup and recovery solutions that cannot clearly articulate the provisioned objectives or source of their definition are seldom viewed as successful implementations. Technical Impact Analysis Following the BIA is the Technical Impact Analysis or TIA. The TIA is the process by which the business requirements are mapped to technical requirements. The TIA identifies additional technical considerations, such as an application-specific requirement to suspend active transaction processing during backup or replication, sensitivity to network architecture and latency. A critical component to be considered during the TIA is the arbitrage of diversely distributed data elements of enterprise data. For example, a transaction may be captured by a primary system, which maintains state information on the transaction while simultaneously updating another database. Without properly correlating the backup and recovery processes, each database may maintain independent referential integrity, but the correlation between the two may be lost. The collection, collation, and aggregation of these technology-specific elements develop into the technical requirements for a backup and recovery solution.
Differing technologies (such as mass storage array-based synchronous or asynchronous replication, operating system replication, "snap" or BCV copies, tape generation and cloning, and multi-phase dual-site commit databases) all provide points on the continuum of availability and resiliency, and can be combined to offer the necessary RTO, RPO, MPO, economic performance and operational viability required by the enterprise. Armed with the business and technical requirements, in conjunction with the capabilities of the various technologies, the system administrator has the information required to engineer and develop the required infrastructure for backup and recovery. It is important to note that backup and recovery can no longer be considered to be simply the tape backup environment, but rather the whole range of potential solutions for the provisioning of availability and resiliency. Once the TIA is complete and firm technical requirements with potential solutions have been identified, the system administrator must perform a financial analysis of the viable solutions. Initial capital expense, increases in operating costs associated with operational personnel or re-occurring monthly network access fees, media costs, maintenance costs, and the GAAP accounting treatment of these expenses should be taken into consideration. The financial analysis ensures that the projected solution set will provide the appropriate economic performance for the enterprise.
Backing Up and Restoring Databases
SQL Server 2000
The backup and restore component of Microsoft® SQL Server™ 2000 provides an important safeguard for protecting critical data stored in SQL Server databases. With proper planning, you can recover from many failures, including:
Media failure. User errors. Permanent loss of a server.
Additionally, backing up and restoring databases is useful for other purposes, such as copying a database from one server to another. By backing up a database from one computer and restoring the database to another, a copy of a database can be made quickly and easily. This section provides the information necessary to implement a complete backup and recovery plan.
Topic Designing a Backup and Restore Strategy Description Helps you analyze and refine your data availability requirements and choose a recovery model for each database. Describes each recovery model in detail, as well as appropriate backup and restore strategies. This topic also describes how to switch between recovery models. Describes the various types of backups available and how they are used. This topic also describes point-in-time recovery, restarting a failed backup or restore, recovering to a particular transaction, and recovering part of a database. Describes backup devices, the backup format, and removable media terminology. This section also describes password security and media management including formatting, appending, overwriting, listing, and verifying media contents. Describes the procedures necessary to protect and recover the system databases.
Using Recovery Models
Backup and Restore Operations
Backing Up and Restoring the System Databases
Handling Large Mission-Critical Environments Describes features and techniques appropriate for highly available or very large production databases. These include using multiple backup devices, file and filegroup backups, file differential backups, and snapshot backups. Copying Databases to Other Servers Describes the use of backup and restore to quickly transport a database to another server.
Designing a Backup and Restore Strategy
SQL Server 2000
You must identify the requirements for the availability of your data in order to choose the appropriate backup and restore strategy. Your overall backup strategy defines the type and frequency of backups and the nature and speed of the hardware required for them. It is strongly recommended that you test your backup and recovery procedures thoroughly. Testing helps to ensure that you have the required backups to recover from various failures, and that your procedures can be executed smoothly and quickly when a real failure occurs. This section includes the following topics.
Topic Analyzing Availability and Recovery Requirements Planning for Disaster Recovery Description Explains the basic requirements for developing a backup and restore plan. Explains how to plan for a disaster (for example, the complete loss of a server). Introduces Microsoft® SQL Server™ 2000 recovery models, which you implement after analyzing your availability requirements.
Selecting a Recovery Model
Analyzing Availability and Recovery Requirements
SQL Server 2000
In order to develop a successful backup and restore plan, you must understand when your data needs to be accessible and the potential impact of data loss on your business. Answering the following questions can help you determine your availability requirements and sensitivity to data loss. Then you can choose the correct Microsoft® SQL Server™ 2000 recovery models for your databases and make the necessary technical and financial tradeoffs.
Here are some basic questions to help you analyze your availability and recovery requirements:
What are your availability requirements? What portion of each day must the database be online? What is the financial cost of downtime to your business? If you experience media failure, such as a failing disk drive, what is the acceptable downtime? In case of a disaster, such as the loss of a server in a fire, what is the acceptable downtime? How important is it to never lose a change? How easy would it be to re-create lost data? Does your organization employ system or database administrators? Who will be responsible for performing backup and recovery operations, and how will they be trained?
Here are some questions to help you choose the tools, techniques, and hardware appropriate for your site:
How large is each database? How often does the data in each database change? Are some tables modified more often than others? What are your critical database production periods? When does the database experience heavy use, resulting in frequent inserts and updates? Is transaction log space consumption likely to be a problem due to heavy update activity? Is your database subject to periodic bulk data loading? Is your database subject to risky updates or application errors that may not be detected immediately? Is your database server part of a SQL Server 2000 failover cluster for high availability? Is your database in a multi-server environment with centralized administration?
When you back up and restore a database, you need to back up the data onto media (for example, tapes and disks). It is recommended that your backup plan include provisions for managing media, such as:
A tracking and management plan for storing and recycling backup sets.
A schedule for overwriting backup media. In a multi-server environment, a decision to use either centralized or distributed backups. A means of tracking the useful life of media. A procedure to minimize the effects of the loss of a backup set or backup media (for example, a tape). A decision to store backup sets on or offsite, and an analysis of how this will affect recovery time.
Planning for Disaster Recovery
SQL Server 2000
You need to create a disaster recovery plan in order to ensure that all your systems and data can be quickly restored to normal operation in the event of a natural disaster (for example, a fire) or a technical disaster (for example, a two-disk failure in a RAID-5 array). When you create a disaster recovery plan, you prepare all the actions that must occur in response to a catastrophic event. It is recommended that you verify your disaster recovery plan through the simulation of a catastrophic event. Consider disaster recovery planning in light of your own environment and business needs. For example, suppose a fire occurs and wipes out your 24-hour data center. Are you certain you can recover? How long will it take you to recover and have your system available? How much data loss can your users tolerate? Ideally, your disaster recovery plan states how long recovery will take and the final database state the users can expect. For example, you might determine that after the acquisition of specified hardware, recovery will be completed in 48 hours, and data will be guaranteed only up to the end of the previous week. A disaster recovery plan can be structured in many different ways and can contain many types of information, including:
A plan to acquire hardware. A communication plan. A list of people to be contacted in the event of a disaster. Instructions for contacting the people involved in the response to the disaster. Information on who owns the administration of the plan.
Running a Base Functionality Script
Usually, you include a base functionality script as part of your disaster recovery plan in order to confirm that everything is working as intended. The base functionality script provides a dependable tool for the system administrator or database administrator to be able to see that
the database is back in a viable state, without depending on end users for verification. Most commonly, this is an .sql file with batched SQL statements run into the server from osql. For other applications, a .bat file is more appropriate because it can contain bcp and osql commands. This base functionality script is very application specific, and it can take many different forms. For example, on a decision support/reporting system, the script may merely be a copy of several of your key reporting queries. For an online transaction processing (OLTP) application, the script may execute a batch of stored procedures that execute INSERT, UPDATE, and DELETE statements.
Preparing for a Disaster
To prepare for disaster, it is recommended that you periodically perform the following steps:
Perform regular database and transaction log backups to minimize the amount of lost data. It is recommended that both system and user databases be backed up. Maintain system logs in a secure fashion. Keep records of all service packs installed on Microsoft® Windows NT® 4.0 or Windows® 2000 and Microsoft SQL Server™. Keep records of network libraries used, the security mode, and the sa password. Maintain a base functionality script for quickly assessing minimal capability. Assess the steps you need to take to recover from a disaster ahead of time on another server, and amend the steps as necessary to suit your environment.
Recovering from a Disaster
To recover from a disaster, perform the following steps after acquiring suitable replacement hardware:
1. Install Windows NT 4.0 or Windows 2000, and apply the appropriate service pack. Verify that appropriate domain functionality exists. 2. Install SQL Server, and apply the appropriate service pack. Restore the master and msdb database backups. Restart the server after restoring the master database. 3. Reconfigure the server for the appropriate network libraries and security mode. 4. Confirm that SQL Server is running properly by checking SQL Server Service Manager and the Windows application log. If the Windows NT 4.0 or Windows 2000 name was changed, use sp_dropserver and sp_addserver to match it with the SQL Server computer name. 5. Restore and recover each database according to its recovery plan. 6. Verify the availability of the system. Run a base functionality script to ensure correct operation. 7. Allow users to resume normal usage.
Selecting a Recovery Model
SQL Server 2000
Microsoft® SQL Server™ provides three recovery models to:
Simplify recovery planning. Simplify backup and recovery procedures. Clarify tradeoffs between system operational requirements.
These models each address different needs for performance, disk and tape space, and protection against data loss. For example, when you choose a recovery model, you must consider the tradeoffs between the following business requirements:
Performance of large-scale operation (for example, index creation or bulk loads). Data loss exposure (for example, the loss of committed transactions). Transaction log space consumption. Simplicity of backup and recovery procedures.
Depending on what operations you are performing, more than one model may be appropriate. After you have chosen a recovery model or models, plan the required backup and recovery procedures. This table provides an overview of the benefits and implications of the three recovery models.
Recovery model Benefits Simple Work loss exposure Recover to point in time?
Permits highChanges since the most Can recover to the end of performance bulk copy recent database or any backup. Then operations. differential backup must be changes must be redone. redone.
Reclaims log space to keep space requirements small.
Full No work is lost due to a Normally none. lost or damaged data If the log is damaged, file. Can recover to any point in time.
Can recover to an arbitrary point in time (for example, prior to application or user
changes since the most recent log backup must be redone.
Bulk-Logged Permits highIf the log is damaged, or Can recover to the end of performance bulk copy bulk operations occurred any backup. Then operations. since the most recent log changes must be redone. backup, changes since that Minimal log space is last backup must be used by bulk redone.
operations. Otherwise, no work is lost.
When a database is created, it has the same recovery model as the model database. To alter the default recovery model, use ALTER DATABASE to change the recovery model of the model database. You set the recovery model with the RECOVERY clause of the ALTER DATABASE statement. For more information, see ALTER DATABASE.
Simple Recovery requires the least administration. In the Simple Recovery model, data is recoverable only to the most recent full database or differential backup. Transaction log backups are not used, and minimal transaction log space is used. After the log space is no longer needed for recovery from server failure, it is reused. The Simple Recovery model is easier to manage than the Full or Bulk-Logged models, but at the expense of higher data loss exposure if a data file is damaged. Important Simple Recovery is not an appropriate choice for production systems where loss of recent changes is unacceptable. When using Simple Recovery, the backup interval should be long enough to keep the backup overhead from affecting production work, yet short enough to prevent the loss of significant amounts of data. For more information, see Simple Recovery.
Full and Bulk-Logged Recovery
Full Recovery and Bulk-Logged Recovery models provide the greatest protection for data. These models rely on the transaction log to provide full recoverability and to prevent work loss in the broadest range of failure scenarios. The Full Recovery model provides the most flexibility for recovering databases to an earlier point in time. For more information, see Full Recovery. The Bulk-Logged model provides higher performance and lower log space consumption for certain large-scale operations (for example, create index or bulk copy). It does this at the
expense of some flexibility of point-in-time recovery. For more information, see BulkLogged Recovery. Because many databases undergo periods of bulk loading or index creation, you may want to switch between Bulk-Logged and Full Recovery models.
Using Recovery Models
SQL Server 2000
You can select one of three recovery models for each database in Microsoft® SQL Server™ 2000 to determine how your data is backed up and what your exposure to data loss is. The following recovery models are available:
Simple Recovery allows the database to be recovered to the most recent backup.
Full Recovery allows the database to be recovered to the point of failure.
Bulk-Logged Recovery allows bulk-logged operations. The recovery model of a new database is inherited from the model database when the new database is created. Note The recovery model for a new database in SQL Server 2000 Personal Edition and SQL Server 2000 Desktop Engine (MSDE 2000) defaults to Simple Recovery.
Backup and Restore Operations
SQL Server 2000
Microsoft® SQL Server™ supports various types of backups to be used separately or in combination. The recovery model you choose will determine your overall backup strategy, including the types of backups available to you. For more information, see Designing a Backup and Restore Strategy and Using Recovery Models. The following table illustrates the types of backups that are available for each recovery model.
Backup Type Model
Backups are created on backup devices, such as disk or tape media. With SQL Server, you can decide how you want to create your backups on backup devices. For example, you can overwrite outdated backups, or you can append new backups to the backup media. For more information, see Managing Backups. Performing a backup operation has minimal effect on running transactions, so backup operations can be run during normal operations. Note Creating or deleting database files is not possible when the database or transaction log is being backed up. If you attempt to create or delete a database file while a backup operation is in progress, the create or delete will fail. If you attempt to start a backup operation while a database file is being created or deleted, the backup operation will wait until the create or delete is completed or the backup operation times out.
SQL Server 2000
Manage your backups carefully to ensure that you can restore your system when needed. Each backup contains the descriptive text you provided when you created the backup, as well as expiration information. This information can be used to:
Identify a backup. Determine when the backup can be safely overwritten. Identify all the backups on a backup medium, such as a tape, to determine which backup needs to be restored.
Additionally, the msdb database contains a complete history of all backup and restore operations on the server. SQL Server Enterprise Manager uses this information to suggest and execute a restore plan that can be used if a database needs to be restored. For example, if a database backup for a user database is created every night, and transaction log backups are created every hour during the day, this backup history information is stored in the msdb database. If the user database needs to be restored, SQL Server Enterprise Manager can use
the history information stored in msdb to apply all the transaction log backups that relate to a specific database backup when the database backup is restored. Note If the msdb database needs to be restored, any backup history information saved since the last backup of msdb was created is lost. When working with backups:
Maintain backups in a secure place, preferably at a site different from the site where the data resides. Keep older backups for a designated amount of time in case the most recent backup is damaged, destroyed, or lost. Establish a system for overwriting backups, reusing the oldest backups first. Use expiration dates on backups to prevent premature overwriti ng. Label backup media to prevent overwriting critical backups. This allows for easy identification of the data stored on the backup media or the specific backup set.
Backing Up and Restoring System Databases
SQL Server 2000
The system databases need to be backed up just as user databases are backed up. This allows the system to be rebuilt in the event of system or database failure, for example, if a hard disk fails. It is important to have regular backups of the following system databases:
master msdb distribution (when the server is configured as a replication Distributor) model (if modified)
Note It is not possible to back up the tempdb system database. tempdb is rebuilt each time an instance of Microsoft® SQL Server™ is started. When an instance of SQL Server is shut down, any data in tempdb is deleted permanently.
Handling Large Mission-Critical Environments
SQL Server 2000
Mission-critical environments often require that databases be available continuously, or for extended periods of time with minimal down-time for maintenance tasks. Therefore, the duration of unexpected situations, such as a hardware failure, that require databases to be restored needs to be kept as short as possible. Additionally, mission-critical databases are often large, requiring longer periods of time to back up and restore. Microsoft® SQL
Server™ offers several methods for increasing the speed of backup and restore operations, thereby minimizing the effect on users during both operations. The following practices will help:
Use multiple backup devices simultaneously to allow backups to be written to all devices at the same time. Similarly, the backup can be restored from multiple devices at the same time. Use a combination of database, differential database, and transaction log backups to minimize the number of backups that need to be applied to bring the database to the point of failure. Use file and filegroup backups and transaction log backups, which allows only those files that contain the relevant data, rather than the entire database, to be backed up or restored. Use snapshot backups which reduce backup and restore time to a minimum. Snapshot backups are supported by third party vendors. For more information, see Snapshot Backups.
Copying Databases to Other Servers
SQL Server 2000
Creating database backups allows you to copy data from one computer to another. The copied database can be used for testing, checking consistency, developing software, running reports, or possibly making databases available to remote branch operations. By copying a database from one computer to another, it is possible to reduce resource contention because processing is offloaded to other computers. Copied databases restored onto separate computers are often used for read-only operations. Note With Microsoft® SQL Server™ 2000, the sort order and code page of the database being copied is no longer a concern. SQL Server now handles multiple collations. A database can also be copied to another computer to act as a standby server. The database and the transaction logs are copied to another computer periodically, which can be brought online if the primary computer fails for some reason. The level of synchronization between the primary computer and the standby server is determined by how often regular backups of the primary computer are created and then applied to the standby server. For more information, see Using Standby Servers. Note It is possible to back up and restore databases between computers running an instance of SQL Server on Microsoft Windows NT® 4.0, Microsoft Windows® 2000, and Windows 98. Other methods for copying data between multiple instances of SQL Server include using:
The Data Transformation Services (DTS) Import/Export Wizard to copy and modify data between any ODBC, OLE DB, or text data source and an instance of SQL Server. The bcp utility to copy data between an instance of SQL Server and a data file, using native, character, or Unicode mode.
The INSERT statement, which uses a distributed query as the select list to extract data from another data source. The Copy Database Wizard to copy or move databases and associated meta data between servers.