Backup and Recovery Fundamentalssrg

Published on April 2017 | Categories: Documents | Downloads: 46 | Comments: 0 | Views: 383
of 45
Download PDF   Embed   Report

Comments

Content


Copyright ©2007 EMC Corporation. Do not Copy - All Rights Reserved.
Backup and Recovery Fundamentals - 1
©2007 EMC Corporation. All rights reserved.
Backup and Recovery Fundamentals
Backup and Recovery Fundamentals
Welcome to Backup and Recovery Fundamentals
The AUDIO portion of this course is supplemental to the material andis not a replacement for
the student notes accompanying thiscourse.
EMC recommends downloading the Student Resource Guide from the Supporting Materials tab,
and reading the notes in their entirety.
Copyright ©2007 EMC Corporation. All rights reserved.
These materials may not be copied without EMC's written consent.
EMC believes the information in this publication is accurate as of its publication date. The information is subject to
change without notice.
THE INFORMATION IN THIS PUBLICATION IS PROVIDED “AS IS.” EMC CORPORATION MAKES NO
REPRESENTATIONS OR WARRANTIES OF ANY KIND WITH RESPECT TO THE INFORMATION IN THIS
PUBLICATION, AND SPECIFICALLY DISCLAIMS IMPLIED WARRANTIES OF MERCHANTABILITY OR
FITNESS FOR A PARTICULAR PURPOSE.
Use, copying, and distribution of any EMC software described in this publication requires an applicable software
license.
EMC
2
, EMC, Symmetrix, CLARiiON, Navisphere, PowerPath, SRDF, TimeFinder, VisualSAN, and where
information lives are registered trademarks, and Access Logixand SnapVieware trademarks of EMC Corporation.
All other trademarks used herein are the property of their respective owners.
Copyright ©2007 EMC Corporation. Do not Copy - All Rights Reserved.
Backup and Recovery Fundamentals - 2
©2007 EMC Corporation. All rights reserved. Backup and Recovery Fundamentals - 2
Course Objectives
Upon completion of this course, you will be able to:
Describe basic backup procedures and terminology
Define the different backup types
Describe generic backup architecture
The objectives for this course are shown here. Please take a moment to read them.
Copyright ©2007 EMC Corporation. Do not Copy - All Rights Reserved.
Backup and Recovery Fundamentals - 3
©2007 EMC Corporation. All rights reserved. Backup and Recovery Fundamentals - 3
Module 1 – Backup Overview
Upon completion of this module, you will be able to:
Describe basic backup procedures and terminology
Define basic backup types
The objectives for this module are shown here. Please take a moment to read them.
Copyright ©2007 EMC Corporation. Do not Copy - All Rights Reserved.
Backup and Recovery Fundamentals - 4
©2007 EMC Corporation. All rights reserved. Backup and Recovery Fundamentals - 4
Backup Overview - What is Backup?
Backup is an additional copy of data that can be used for
restore and recovery purposes
Backups are often stored on portable media such as
tape
A ‘backup operation’ refers to the copying of data for the purpose of having an additional copy
of an original source. Date is stored on separate tape media not located on the server. If the
original data is damaged or lost, the data may be copied back from that source.
The backup copy is usually retained over a period of time, depending on the type of data, and
the type of backup. There are three primary purposes for backup: disaster recovery, archival, and
operational backup. We review them in more detail on the next slide.
Backed-up data may be on such media as disk or tape, depending largely on the purpose of the
backup. For example, backing up to disk may be more efficient than tape in operational backup
environments.
Copyright ©2007 EMC Corporation. Do not Copy - All Rights Reserved.
Backup and Recovery Fundamentals - 5
©2007 EMC Corporation. All rights reserved. Backup and Recovery Fundamentals - 5
Three Primary Purposes for Backups
Disaster Recovery
– Restores a computer to an operational state following a disaster
Archival
– Consists of files and records that have been selected for permanent
or long-term preservation
Operational Backup
– Restores small numbers of files after they have been accidentally
deleted or corrupted
Disaster-recovery addresses the requirement to be able to restore all, or a large part of, an IT
infrastructure in the event of a major disaster. Some organizations use tape-based backup media
for their critical data. This media is stored off-site as part of the disaster recovery plan. Other
organizations use remote replication technology to create disaster-recovery sites. These sites
often replicate whole data centers, and can be brought online ina relatively short period of time.
While replication technologies work very well for disaster-recovery, they share one important
(and occasionally undesirable) characteristic. Because they replicate data faithfully from one
place to another, any infected or corrupted file is replicated just as faithfully as the good and
pure file. So this makes them valuable to disaster-recovery, but not so good for operational
backup.
Archival is a common requirement used to preserve transaction records, email, and other
business work products for regulatory compliance. The regulations could be internal,
governmental, or perhaps derived from specific industry requirements. Data archived is
reference data, not live, operational data.
Operational backup is typically the collection of data for the eventual purpose of restoring, at
some point in the future, data that has become lost or corrupted.
Copyright ©2007 EMC Corporation. Do not Copy - All Rights Reserved.
Backup and Recovery Fundamentals - 6
©2007 EMC Corporation. All rights reserved. Backup and Recovery Fundamentals - 6
Backup/Recovery Statistics
Reliance on tape alone for data recovery is no longer a
best practice
More than 80% of restore requests are made within 48
hours of the data loss
60~70% of storage management effort is devoted to
Backup/Restore
15% of a storage administrator’s time is spent on
recovery operations
5~20% of Backup/Restore jobs fail nightly
B/R cost are approximately $5,935 per TB of disk
storage per year (META Group, April 1, 2004 )
This slide shows some statistics relating to backup and recovery from a study conducted by the
META Group. These statistics are very important and can help drive the development of a
backup solution. It emphasizes the importance of a backup/recovery solution to companies, how
complex the solution can be, and can also be used to evaluate metrics such as cost-benefit.
Today, users can choose from a wide array of backup solutions tomeet their requirements. They
don’t need to rely exclusively on tape-based media as their only option for backup. For example,
backup-to-disk offers faster, more predictable backup and recovery, higher service levels and
more manageable backup windows.
Copyright ©2007 EMC Corporation. Do not Copy - All Rights Reserved.
Backup and Recovery Fundamentals - 7
©2007 EMC Corporation. All rights reserved. Backup and Recovery Fundamentals - 7
Considerations for the Backup/Restore Process
Business needs determine backup requirements:
– What are the restore requirements – RPO & RTO?
– Where and when will the restores occur?
– What are the most frequent restore requests?
– Which data needs to be backed up?
– How frequently should data be backed up?
hourly, daily, weekly, monthly
– How long will it take to backup?
– How many copies to create?
– How long to retain backup copies?
This slide presents a number of important questions that need to be considered before
implementing a backup/restore solution. Some examples include:
The Recovery Point Objective
The Recovery Time Objective
The media type to be used (disk or tape)
Where and when restore operations occur – especially if an alternate host will be used to
receive the restored data
When to perform backups
The backups granularity – Full, incremental or cumulative
How long to keep the backup – for example, some backups need to be retained for four
years, others just for a month
Is it necessary to take multiple copies of the backup?
The concepts behind many of these questions are discussed in more detail later in this module.
Copyright ©2007 EMC Corporation. Do not Copy - All Rights Reserved.
Backup and Recovery Fundamentals - 8
©2007 EMC Corporation. All rights reserved. Backup and Recovery Fundamentals - 8
Backup to Tape Today
Over 70% of all backed up data today goes to tape
Restore from tape is usually a slow process
Disaster recovery may require retrieving tapes stored
offsite
Operational recoveries may require mounting of many
tapes just to restore a single file
Typically, most of backed up data goes to tape (~70%), but this number is going down due to
the adoption of backup-to-disk solutions. Tape is a good storage media when you consider such
factors as portability and capacity, as well as the ability to take a set of tapes offsite, at a low
cost.
The problem with tapes is clear when you need to restore the data. If you are restoring from a
disaster situation, tapes are manageable – albeit slow, because you most likely go to your offsite
location, retrieve the tapes and start the restore. But that’s a very small percentage of all the
restore requests you’ll have in your environment. Most restore requests are operational, and not
disaster recovery requests.
Tapes are a problem for operational backup. You may find that the backup software is trying to
mount two, three, sometimes even more tapes, depending on the backup policy, just to restore a
single file. This underscores the need for a new backup solutionmodel. Tapes are not the most
reliable way to store all types of backups, and may not be the best way to perform restores.
Today, the usage of disks to store some types of backup data improves the restore performance
and reliability.
For example, supposing you performed a full backup on Sunday, and incremental backups for
the remaining weekdays. This model can prove slow in the case of a full restore. If a full restore
is needed, you would need to first restore the full backup and then apply all of the incremental
backups (which can take a long time).
Copyright ©2007 EMC Corporation. Do not Copy - All Rights Reserved.
Backup and Recovery Fundamentals - 9
©2007 EMC Corporation. All rights reserved. Backup and Recovery Fundamentals - 9
What is Backed up?
Operating Environments
– Servers
– Desktop PCs
– Laptop PCs
Applications
– ERP - (i.e. SAP, Oracle Apps, Peoplesoft, etc.)
– CRM - (i.e. Siebel, etc)
– Databases (Oracle, UDB, MS SQL)
– Messaging ( Microsoft Exchange, etc)
Application data
– For all of the above
Logs and journals
– Application transaction logs, database journals, file system journals
Before defining a backup solution, it is important to examine your backup environment, in order
to determine the type of data you would need to back up. Different types of data can require
different backup strategies. For example, some applications may need to be placed in a specific
quiescent state or even to be closed before backup starts. This would guarantee consistency in
case of a restore. In some cases special backup agents are usedto make this process automatic.
If an application must be closed before it can be backed up, that could have an impact on when
and how backups are performed.
In addition to finding out the type of data to back up, it is necessary to decide when and where to
perform the backups. Each type of data has different backup requirements, for example;
frequency, backup media, and retention.
Copyright ©2007 EMC Corporation. Do not Copy - All Rights Reserved.
Backup and Recovery Fundamentals - 10
©2007 EMC Corporation. All rights reserved. Backup and Recovery Fundamentals - 10
What is Operational Restore?
Most restores are at the file and volume level
– Restore frequency is usually high
Full system restores are rare
Most common restores
– Email
– Files
– Application data
A good general rule to follow when planning a backup solution is to evaluate the restore needs.
For instance, if a particular user has a high requirement for email restores, it is important to plan
a backup on a reliable media that also provides highly granular restore capabilities to ensure
rapid data recovery.
Copyright ©2007 EMC Corporation. Do not Copy - All Rights Reserved.
Backup and Recovery Fundamentals - 11
©2007 EMC Corporation. All rights reserved. Backup and Recovery Fundamentals - 11
Backup Granularity and Levels
Full Backup
Cumulative (Differential)
Incremental
Full Cumulative Incremental
The granularity and backup levels depend on business needs and to some extent on
technological limitations. Some backup strategies define up to ten levels of backup. IT
organizations use a combination of these to fulfill their requirements. Most use some
combination of Full, Cumulative, and Incremental backups.
Full
A full backup is exactly what the name implies, a backup of all data on the target volumes,
regardless of any changes made to the data itself. Another scheme that is possible is a
“synthetic” or “constructed” full backup. In a synthetic full backup, information is taken from a
full backup and the incremental to create a new full backup. This allows a full backup to be
created offline, allowing the network to continue to function without any performance
degradation or disruption to network users. Synthetic full backups are used when the backup
window is too small for the other options.
Cumulative (Differential)
A cumulative backup is a kind of incremental backup that contains changes since the last full
backup.
Incremental
An incremental contains the changes since the last incremental backup, or the last full,
whichever was most recent.
Copyright ©2007 EMC Corporation. Do not Copy - All Rights Reserved.
Backup and Recovery Fundamentals - 12
©2007 EMC Corporation. All rights reserved. Backup and Recovery Fundamentals - 12
Restoring a Cumulative Backup
Key Features
– More files to be backed up, therefore it takes more time to backup and uses
more storage space
– Much faster restore because only the last full and the last cumulative
backup must be applied
Files 1, 2, 3, 4, 5, 6
Production Production
Cumulative Cumulative
Tuesday
File 4 Files 1, 2, 3
Monday
Full Backup Full Backup Cumulative Cumulative
Wednesday
Files 4, 5
Cumulative Cumulative
Thursday
Files 4, 5, 6
In this example, a full backup is taken on Monday. For the remaining weekdays, a cumulative
backup is taken. These cumulative backups backup ALL FILES that have changed since the
LAST FULL BACKUP.
On Tuesday, File 4 is added. Since File 4 is a new file that has been added since the last full
backup, it is backed up that evening (Tuesday).
On Wednesday, File 5 is added. Now, since both File 4 and File 5 are files that have been added
or changed since the last full backup, both files will be backedup that evening (Wednesday).
On Thursday, File 6 is added. Again, since File 4, File 5, and File 6 are files that have been
added or changed since the last full backup, all three files will be backed up that evening
(Thursday).
On Friday morning, there is a corruption of the data, so the data must be restored. The first step
is to restore the full backup from Monday evening. Then, only the backup from Thursday
evening is restored because it contains all the new/changed files from Tuesday, Wednesday, and
Thursday.
Copyright ©2007 EMC Corporation. Do not Copy - All Rights Reserved.
Backup and Recovery Fundamentals - 13
©2007 EMC Corporation. All rights reserved. Backup and Recovery Fundamentals - 13
Restoring an Incremental Backup
Key Features
– Files that have changed since the last full or incremental backup are backed
up
– Fewest amount of files to be backed up, therefore faster backup and less
storage space
– Longer restore because last full and all subsequent incremental backups
must be applied
Files 1, 2, 3, 4, 5
Production Production
Incremental Incremental
Tuesday
File 4
Incremental Incremental
Wednesday
File 3
Incremental Incremental
Thursday
File 5 Files 1, 2, 3
Monday
Full Backup Full Backup
In this example, a full backup is taken on Monday. For the remaining weekdays, an incremental
backup is taken. These incremental backups only backup files that are new or that have changed
since the last full or incremental backup.
On Tuesday, a new file is added, File 4. No other files have been changed. Since File 4 is a new
file that has been added after the previous backup on Monday evening, it is backed up that
evening (Tuesday).
On Wednesday, there are no new files added since Tuesday, but File 3 has changed. Since File
3 has changed after the previous evening backup (Tuesday), it will be backed up that evening
(Wednesday).
On Thursday, no files have changed but a new file has been added, File 5. Since File 5 was
added after the previous evening backup, it will be backed up that evening (Thursday).
On Friday morning, there is a data corruption, so the data must be restored. The first step is to
restore the full backup from Monday evening. Then, every incremental backup that was done
since the last full backup must be applied, which, in this example, means the Tuesday,
Wednesday, and Thursday incremental backups.
Copyright ©2007 EMC Corporation. Do not Copy - All Rights Reserved.
Backup and Recovery Fundamentals - 14
©2007 EMC Corporation. All rights reserved. Backup and Recovery Fundamentals - 14
Backup and Restore Concepts
Backup Metadata
Backup Server
Backup Software
Backup Window
Catalog
Expiration Date
Full backup
Hot Backup
Here are some useful terms when discussing backup technology:
Backup Metadata: Information about the backup data, such as file names, time of backup, size, permissions,
ownership, and most importantly, tracking information to allow locating the data to be restored. The tracking
information is stored in the backup catalog.
Backup Server: The central point of administration and management. It maintains the Backup Metadata.
Backup Software: Software running on the backup server and backup clients that manages the flow of backup data
from backup clients to the backup media. This software also manages the restoration of previously backed up data.
Backup Window: The period of time that a system is available to perform a backup procedure, traditionally 6-8
hours in the evening or weekends, but could occur at any time. Due to the accelerating rate of data growth, backup
windows for many applications are shrinking and, in some cases, nonexistent.
Catalog: A metadata database maintained by the backup server.
Expiration Date: The date that the contents of a tape cartridge can be overwritten. (see Retention Period)
Full backup: A backup that includes all data, usually done weekly.
Hot Backup: A backup performed while the application (e.g. Exchange, Oracle, SQL, etc.) is still running and
providing services to end users. Performance on the application may be somewhat degraded during this operation.
Copyright ©2007 EMC Corporation. Do not Copy - All Rights Reserved.
Backup and Recovery Fundamentals - 15
©2007 EMC Corporation. All rights reserved. Backup and Recovery Fundamentals - 15
Backup and Restore Concepts
Recovery Point Objective (RPO)
Recovery Time Objective (RTO)
Restore
Retention Period
Rotation Period
Some more backup and recovery concepts are listed on this slide.
Recovery Point Objective (RPO): A point in time in which application data must be recovered
in order to resume business transactions.
Recovery Time Objective (RTO): Maximum allowable time to bring the application back
online.
Restore (Operational): The movement of a file or a group of files from a previous backup back
to a primary storage device. The backup copy of this data was created and retained for the sole
purpose of recovering deleted, broken, or corrupted data on the primary disk. Usually kept for a
short period of time. This backed-up data may be on disk or tape. Depending on a company’s
policies, some or all of this data may be moved to tape (if already not on tape) for off-site
storage to be used for Disaster Recovery.
Retention Period: The length of time that the backup software prevents the overwriting of a
tape. This concept is tied to expiration date (mentioned previously).
Rotation Period: The length of time that a particular backup set is retained ontape before it is
overwritten by a new backup set.
Copyright ©2007 EMC Corporation. Do not Copy - All Rights Reserved.
Backup and Recovery Fundamentals - 16
©2007 EMC Corporation. All rights reserved. Backup and Recovery Fundamentals - 16
Data Storage Methods
Online storage
Near-line storage
Offline storage
Off-site vault
Different storage methods offer different levels of accessibility, security and cost. In most cases,
a mix of all four storage methods can be the most effective storage strategy.
Online storage: Sometimes called secondary storage, online storage is typically the most
accessible type of data storage. A good example would be a large disk array. This type of
storage is very convenient and speedy, but is relatively expensive and vulnerable to being
deleted or overwritten, either by accident, or in the wake of a data-deleting virus payload.
Near-line storage: Sometimes called tertiary storage, near-line storage is typically less
accessible and less expensive than online storage. A good example would be an automatic tape
library. Near-line storage is used for archival of rarely accessed information, since it is much
slower than secondary storage.
Offline storage: An example of offline storage is a computer storage system which must be
driven by a human operator before a computer can access the information stored on the medium.
For example, a media library system which uses off-line storage media, as opposed to near-line
storage, where the handling of media is automatic.
Off-site vault: To protect against a disaster or other site-specific problem, many people choose
to send backup media to an off-site vault. The vault can be as simple as the system
administrator's home office or as sophisticated as a disaster hardened, temperature controlled,
high security bunker that has facilities for backup media storage.
Copyright ©2007 EMC Corporation. Do not Copy - All Rights Reserved.
Backup and Recovery Fundamentals - 17
©2007 EMC Corporation. All rights reserved. Backup and Recovery Fundamentals - 17
Developing the Success Criteria…
Requires understanding of:
– Application capacity to address
– Each application’s criticality to the business
– Recovery point objectives
Ties to backup frequency and retention timelines
– Recovery time objectives
Ties to service lever requirements
Choice of connectivity
– SAN, LAN, or combination
17
In understanding how to develop an effective backup architecture, you need to first look at the
total amount of capacity that has to be backed up, and then look at the types of applications
involved. With that in mind, you need to make choices as to what needs to be backed up, how
often it’s backed up, and how fast recovery needs to be if required. Finally, the connectivity
needs to be determined; whether it’s SAN based, LAN based, or some combination of the two.
Talking specifically about backing up to disk, one of the biggest challenges that a user faces is
trying to figure out how to size the solution. When a backup-to-disk scenario is implemented, it
can change the current backup retention, as well as backup frequency, in order to gain the best
value from the solution.
The following slides cover these ideas in more detail.
Copyright ©2007 EMC Corporation. Do not Copy - All Rights Reserved.
Backup and Recovery Fundamentals - 18
©2007 EMC Corporation. All rights reserved. Backup and Recovery Fundamentals - 18
Application Mix Example
Uptime RTO RPO
Backup
Window
Tier 1
applications
24x7x365 Seconds Last transaction None
E-mail 24x7x365 Minutes Full restore Minutes
Tier 2
applications
Business
hours
Minutes to
hours
Minimal loss
Minutes
to hours
File servers
Business
hours
Minutes to
hours
Minimal loss Hours
Business
records and
archived data
Business
hours
Hours to
days
Best effort
(unless
regulated)
Days
M
i
s
s
i
o
n

C
r
i
t
i
c
a
l
i
t
y
Here is a typical mix of applications. In order to have a successful backup implementation, it is
important to understand the operational characteristics of each of the applications in the
environment.
For example, Tier 1 applications may need to be recovered within a matter of seconds, or
revenues could be impacted. This is particularly true in businesses that have revenues tied to
system uptime. Note that the RTO is measured in seconds, and the RPO goal is the very last
transaction. Also, in this particular case, there is no window of time during which backups can
occur, so leveraging online backups to a point-in-time copy makes sense.
E-mail in this situation is similar to Tier 1, with minor differences in the recovery objectives as
well as the backup times.
In considering the other applications, their requirements are a lot less stringent. As a
consequence, the backup and recovery strategies employed with a backup scenario will be
architected differently than the Tier 1 and e-mail applications.
Creating this kind of spreadsheet gives clarity to the requirements of a backup implementation.
Copyright ©2007 EMC Corporation. Do not Copy - All Rights Reserved.
Backup and Recovery Fundamentals - 19
©2007 EMC Corporation. All rights reserved. Backup and Recovery Fundamentals - 19
Inventory and Gather Data
Backup content
– How much is backed up?
– How often is it backed up?
– How long is it retained for?
Clean house!
– Stale data, duplicate data
– Non-corporate data
MP3s
– Extinct user data
Removing the inactive data…
– Accelerates backups
– Accelerates restores
This slide illustrates the use of Storage Resource Management tools that provide a general look
at the content currently stored within the enterprise. Of particular interest in the effort to re-
architect backup solutions is stale data-- multiple copies of the same data, data that doesn’t
belong to a corporate backup, and data associated with employees that are no longer employed
at the company. This content should be stripped from the current backup process.
For example, many organizations do not get around to removing old users’ content from file
servers and continue to backup data that hasn’t changed in years. The same applies to e-mails
and application data.
It is important to note the financial results of this effort. Taking this into account, it not only
shrinks the backup size (enabling faster, more reliable backups and recoveries), but also changes
the volume of contents that is being backed up. You can now take advantage of the significant
savings in the actual number of cartridges and drives required.
Copyright ©2007 EMC Corporation. Do not Copy - All Rights Reserved.
Backup and Recovery Fundamentals - 20
©2007 EMC Corporation. All rights reserved. Backup and Recovery Fundamentals - 20
EMC Internal Case Study
EMC IS: E-mail Restore Requests Since Backup
Cumulative Percent of Restore Requests
27%
77%
92%
100% 100% 100%
0%
20%
40%
60%
80%
100%
120%
Same Day 1–2 Days 3–6 Days 7–14 Days 15–29 Days >30 Days
Sizing the Requirement—Backup Capacity
This slide addresses an important question: How much backup datais really enough to protect
my business? One company’s internal IT department evaluated their backup and recovery
process and the amount of backup data they were storing on tape. The analysis looked at restore
requests, due to inadvertent e-mail deletions by users, over a 12-month period for their large e-
mail infrastructure.
The chart diagrams the cumulative number of restore requests over time, starting with the actual
receipt of an e-mail. The diagram indicates that for e-mail data, after 14 days, nearly everyone
had already requested that an e-mail be restored. Yet, the internal IT department had a policy to
store e-mail backups on tape for over 60 days.
By eliminating retention of unnecessary backups, significant cost savings were achieved.
Copyright ©2007 EMC Corporation. Do not Copy - All Rights Reserved.
Backup and Recovery Fundamentals - 21
©2007 EMC Corporation. All rights reserved. Backup and Recovery Fundamentals - 21
Module Summary
Key points covered in this module:
Backup basics
Backup concepts
Backup types
– Full
– Incremental
– Cumulative
Backup planning
These are the key points covered in this module. Please take a moment to review them
Copyright ©2007 EMC Corporation. Do not Copy - All Rights Reserved.
Backup and Recovery Fundamentals - 22
©2007 EMC Corporation. All rights reserved. Backup and Recovery Fundamentals - 22
Module 2 – Backup Architecture
Upon completion of this module, you will be able to:
Describe Generic Backup Architecture
– Client, Server, Storage Node
Identify Backup Topologies
– Direct Attached Backup, LAN Backup, SAN Backup
Discuss Backup Granularity
The objectives for this module are shown here. Please take a moment to read them.
Copyright ©2007 EMC Corporation. Do not Copy - All Rights Reserved.
Backup and Recovery Fundamentals - 23
©2007 EMC Corporation. All rights reserved. Backup and Recovery Fundamentals - 23
Backup Architecture – How It Works
Client/Server Relationship
Server
– Directs Operation
– Maintains Catalog
Client
– Gathers Data for Backup
Storage Node
Backup products vary, but some share common characteristics. The basic architecture of a
backup software system is the client-server relationship, with a backup server and some number
of backup clients or agents. The backup server directs the operations and owns the backup
catalog (the information about the backup). The catalog contains the table-of-contents for the
backup image. It also contains information about the backup session itself.
The backup server depends on the backup client to gather the data to be backed up. The backup
client can be local or it can reside on another system, presumably to backup the data visible to
that system.
There is another component called a storage node. It is known by other names by different
vendors (Tivoli-Storage Agent, Veritas-Media Server, CommVault-Media Agent), but “storage
node” is the Storage Networking Industry Association (SNIA ) term. The storage node is the
entity responsible for writing the backup image to the backup device. Typically, there is a
storage node packaged with the backup server and the backup device is attached directly to the
backup server’s host platform. Storage nodes play an important role in backup planning as it
can be used to consolidate backup servers.
Copyright ©2007 EMC Corporation. Do not Copy - All Rights Reserved.
Backup and Recovery Fundamentals - 24
©2007 EMC Corporation. All rights reserved. Backup and Recovery Fundamentals - 24
Backup Architecture – Backup Topologies
There are three basic backup topologies:
– Direct Attached Backup
– LAN Backup
– SAN Backup
This slide reviews the most common backup topologies:
Direct Attached Backup –The backup data flows directly from the host to be backed up to
the tape, without utilizing the LAN. In this model, there is no centralized management and it
is difficult to grow the environment.
LAN Backup – In this model, the backup data flows from the host to be backed up to the
tape through the LAN. We have centralized management, but the problem is the LAN
utilization as all data goes through the LAN.
SAN Backup – The backup data goes through the SAN. The LAN is used only to move
metadata. In this model, we have good performance for the backup and simplified
management, but the added expense of an additional infrastructure.
Copyright ©2007 EMC Corporation. Do not Copy - All Rights Reserved.
Backup and Recovery Fundamentals - 25
©2007 EMC Corporation. All rights reserved. Backup and Recovery Fundamentals - 25
Direct-Attached Backups
Backups are performed directly from the backup client’s
disk to the backup client’s tape devices.
Advantages
– High Speed
– Tape devices dedicated to the host
Disadvantages
– Impacts the host and application performance
– Distance restrictions
Advantages
The key advantage of direct-attached backups is speed. The tape devices can operate at the
speed of the channels. Direct-attached backups optimize backup and restore speed since the tape
devices are close to the data source and dedicated to the host.
Disadvantages
Direct-attached backups impact the host and application performance since backups consume
host I/O bandwidth, memory, and CPU resources. Direct-attached backups potentially have
distance restrictions if short-distance connections such as SCSI are used.
Copyright ©2007 EMC Corporation. Do not Copy - All Rights Reserved.
Backup and Recovery Fundamentals - 26
©2007 EMC Corporation. All rights reserved. Backup and Recovery Fundamentals - 26
Data
Direct-Attached Backups
Catalog
Backup Server
Metadata
Media
Backup
Storage Node
LAN
This is an example of a Direct Attached Backup environment. Notice some of the features of
this backup:
A tape drive is attached directly to the client.
Only metadata goes to the backup server, relieving pressure on the LAN. This could
potentially be a management nightmare and the cost could be prohibitive.
A solution is to share the tape units.
In this example, the client is a Storage Node, which is the entity responsible for writing the
backup image to the backup device.
Copyright ©2007 EMC Corporation. Do not Copy - All Rights Reserved.
Backup and Recovery Fundamentals - 27
©2007 EMC Corporation. All rights reserved. Backup and Recovery Fundamentals - 27
LAN-Based Backups
The Backup Server is the central control point for all
backups
The metadata and backup policies reside in the Backup
Server
Storage Nodes control backup devices and are
controlled by the Backup Server
Advantages
LAN backups enable an organization to centralize backups and pool tape resources. The
centralization and pooling can enable standardization of processes, tools, and backup media.
Centralization of tapes can also improve operational efficiency.
Disadvantages
The backup process has an impact on production systems, the client network, and the
applications. It consumes CPU, I/O bandwidth, LAN bandwidth, and memory. In order to
maintain finite backup points, applications might have to be halted and databases shut down.
Copyright ©2007 EMC Corporation. Do not Copy - All Rights Reserved.
Backup and Recovery Fundamentals - 28
©2007 EMC Corporation. All rights reserved. Backup and Recovery Fundamentals - 28
LAN Backup Data Flow
Backup Server
LAN
Metadata
Storage Node
Data
Mail Server File Server Database Server
Metadata
Data
This is an example of how LAN-based backups work.
Let’s start with the simplest example of a traditional LAN backup. All systems are LAN-
connected and all storage is direct-attached. The tape is locally-attached to the backup server.
Backup data has to make its way from the backup client (the source) to the backup device (the
destination). It should do so with the least possible impact tothe production network. There are
a number of ways to minimize this impact. These include configuring separate networks for
backup, and installing dedicated storage nodes on some application servers. Even when utilizing
these types of measures, it is possible for even a high-speed network to be overwhelmed by two
cached disk-array connections and two to six tape libraries operating in full streaming mode.
Also worth considering is that backup data, streaming across the LAN, affects the network
performance of all systems connected to the same network segment as the backup server.
Environments that back up many logical disks to many tape libraries will be constrained by even
the fastest network technologies.
The critical performance path is the network connection between the backup client and the
LAN. This path is critical since it ultimately determines how much data can be backed up or
restored within time constraints.
Copyright ©2007 EMC Corporation. Do not Copy - All Rights Reserved.
Backup and Recovery Fundamentals - 29
©2007 EMC Corporation. All rights reserved. Backup and Recovery Fundamentals - 29
SAN Based Backups
LAN-free backups use storage area networks (SANs) to
move backup data rapidly and reliably. The SAN is
usually used in conjunction with backup software that
supports tape device sharing
Metadata is still moved over the LAN to the backup
server
Backup Metadata contains information about what has been backed up, such as file names, time
of backup, size, permissions, ownership, and most importantly, tracking information for rapid
location and restore. It also indicates where it has been stored, for example, which tape.
Data, the contents of files, databases, etc., is the primary information source to be backed up.
A SAN-enabled backup infrastructure introduces these advantages to thebackup process:
Provides Fibre Channel performance, reliability, and distance.
Requires fewer processes and reduced overhead.
Does not use the LAN to move backup data.
Eliminates or reduces dedicated backup servers.
Improves backup and restore performance.
Copyright ©2007 EMC Corporation. Do not Copy - All Rights Reserved.
Backup and Recovery Fundamentals - 30
©2007 EMC Corporation. All rights reserved. Backup and Recovery Fundamentals - 30
LAN
SAN Backup Data Flow
Metadata
Storage Node
Data
Mail Server
Data
SAN
Backup Server
The SAN is valuable if you want to share a Tape Library Unit (TLU). Attach the TLU and
clients to the SAN, and all clients can share a single TLU.
During backup, the clients read the data from the SAN and write to the SAN-attached tape. The
data never leaves the SAN environment. The only thing to fly over the LAN is the metadata, but
that pales in comparison to the data volumes.
The emergence of ATA as a backup medium brings us to the next step in the evolution. You can
add a CLARiiON/ATA box to the SAN and have your immediate backupgo to disk. Later, the
backup server moves the backup data from disk to tape so that the tape can be shipped off-site
for disaster recovery and long-term retention.
Copyright ©2007 EMC Corporation. Do not Copy - All Rights Reserved.
Backup and Recovery Fundamentals - 31
©2007 EMC Corporation. All rights reserved. Backup and Recovery Fundamentals - 31
Best Level of Granularity
Total Volume of Data
Volume of Changed Data
What Type of Data is Backed Up
– Is Compression an Option
Hardware or Software compression
Backup Window
– Staggering J obs
– Rush to daylight
Deciding which level of backup to schedule is not as easy as it may seem. Granularity levels
hinge on several considerations. First, what is the aggregate weekly data change rate? If the
change rate were close to or greater than 100% (daily change about 20%), it makes little sense to
entertain an incremental backup because of the overhead for deciding which files need to be
backed up. In that case, a full backup could actually take less time than the incremental, even
though less physical data is being backed up.
If the rate of data change amounts to considerably less than 100% per week, then an alternate
model might be more appropriate. A rotation scheme that still includes monthly full backups and
daily incrementals, but instead of performing full backups every week, performs cumulative
incremental backups. This can, given a modest data change rate, save both time and storage
resources.
When devising a backup strategy, it is critical to understand the nature of the data, and the
nature of changes to the data. Some applications use larger files than others. An environment
with such applications tends to have a larger data change rate, because even a small change to
the data results in the whole file being changed. The larger the average file size, the greater the
percentage of the data set. Other applications, like software development, use many smaller
files. The rate of change in these environments can be much lower. In such environments, the
more mature the data set, the lower the change rate. Another factor to consider is the properties
of the files in your backup set. For instance, are they natively compressible or will the negative
impact compression has on performance make it less desirable?
Copyright ©2007 EMC Corporation. Do not Copy - All Rights Reserved.
Backup and Recovery Fundamentals - 32
©2007 EMC Corporation. All rights reserved. Backup and Recovery Fundamentals - 32
Module Summary
Key points covered in this module:
Generic Backup Architecture
– Direct Attached Backup (LAN Free Backup)
– LAN Backup
– SAN Backup
Backup Granularity and levels
These are the key points covered in this module. Please take a moment to review them
Copyright ©2007 EMC Corporation. Do not Copy - All Rights Reserved.
Backup and Recovery Fundamentals - 33
©2007 EMC Corporation. All rights reserved. Backup and Recovery Fundamentals - 33
Module 3 – Backup Terminology & Considerations
Upon completion of this module, you will be able to:
Define RTO and RPO
Define Backup Data and Business Data
The objectives for this module are shown here. Please take a moment to read them.
Copyright ©2007 EMC Corporation. Do not Copy - All Rights Reserved.
Backup and Recovery Fundamentals - 34
©2007 EMC Corporation. All rights reserved. Backup and Recovery Fundamentals - 34
RPO and RTO
RPOs must match as closely as possible to the user’s
needs
– Finer granularity means lower cost to resume
– Longer retention periods support requests farther into the past
– Longer retention means higher storage costs
RTOs must be as close to immediate as possible
– Shorter restore times minimize impact of data loss
We previously discussed what RPO and RTO were, now let’s look at their impact on the backup
solution.
The RPOs must match as closely as possible the user’s needs, so your backup policy must
consider that and have appropriate retention periods and granularity. But, you have to consider a
lot of things, such as storage costs. Longer retention means higher storage costs.
RTOs must be defined to be as close as possible, just to minimize impact of data loss. Of course,
every type of data has a different value and it must be assigned and defined by the company
policy. For example, restoring data from production databases can be more important than file
server data. In this case, the RTO of the database file will be shorter than the file server data.
Copyright ©2007 EMC Corporation. Do not Copy - All Rights Reserved.
Backup and Recovery Fundamentals - 35
©2007 EMC Corporation. All rights reserved. Backup and Recovery Fundamentals - 35
Backup Data Set Properties
Backup Data on Tape
– Used to recover deleted, broken, or corrupted data on disk
– Backup data is NOT archive data
Business Data on Tape
– Data created and retained as a result of business activity
– Transactions, records, files, objects, reports, etc.
– Business data on tape kept for a long period of time is Archive Data
The definition of the data type to be backed up is one of the most important factors in the
development of a backup solution. If the backup administrator has the information regarding the
type of data to be backed up, then it’s possible to define the correct retention period for each
type of data, the correct media type, etc. Another thing to analyze is the number of tapes
involved in a given backup solution. After doing an inventory of the environment, it may be
found that a lot of unnecessary backups are being retained for long periods. In order to minimize
this, you can categorize data into two general types:
Operational Backup Data
Data created and retained for the sole purpose of recovering deleted, broken or corrupted
data on disk.
Usually kept for a short period of time - Backup data is NOT archive data.
Archived Business Data
Transaction, records, files, objects, reports, etc. that are created and retained as a result of
business activity with customers, suppliers, and partners.
Business data on tape kept for a long period of time is Archive data.
Data retained for long periods of time for retention and regulatory purposes.
Copyright ©2007 EMC Corporation. Do not Copy - All Rights Reserved.
Backup and Recovery Fundamentals - 36
©2007 EMC Corporation. All rights reserved. Backup and Recovery Fundamentals - 36
Data Considerations
Files
File sizes and the number of files
Data compression
Retention periods and data management
Many organizations have dozens of heterogeneous platforms that support a complex application.
Consider a data warehouse where data from many sources is fed into the warehouse. When this
scenario is viewed as “The Data Warehouse Application”, it easily fits this model. To capacity
plan, back up, restore, and recover these complex applications can easily involve hundreds or
thousands of files scattered across dozens of heterogeneous systems. These systems may not be
in a single physical location. Portions of the application may have differing backup schedules.
Managing business continuance for such an application is a big challenge for the application
owner, but consider that a storage administrator may have to manage hundreds of these complex
applications.
The key issues are:
How the backups for subsets of the data are synchronized
How these applications are restored
How these applications are recovered
File sizes and the number of files
Data can have a large impact on backup, restore, and recovery performance
Copyright ©2007 EMC Corporation. Do not Copy - All Rights Reserved.
Backup and Recovery Fundamentals - 37
©2007 EMC Corporation. All rights reserved. Backup and Recovery Fundamentals - 37
Compression
Compression rate depends on the type of data
– Application binaries
– Text
– J PEG/ZIP files
Some types of data compresses well
Other types of data are already compressed, such as
J PEG and ZIP files
Many tape devices have built-in hardware compression technologies. To effectively use these
technologies, it is important to understand the characteristics of the data. Some data, such as
application binaries, do not compress well. Data such as text can compress very well, while
other data like J PEG and ZIP files are already compressed. Files that are already compressed
have a tendency to get larger when they are compressed again.
Copyright ©2007 EMC Corporation. Do not Copy - All Rights Reserved.
Backup and Recovery Fundamentals - 38
©2007 EMC Corporation. All rights reserved. Backup and Recovery Fundamentals - 38
Retention Periods
Vaulting
Cloning
– Twinning
Rotation
Retention Periods are the length of time that a particular version of a dataset is available to be
restored. The rate of change to the data ties in with restore requirements as well. The faster the
data changes, the finer the granularity needed to support precise RPOs. By understanding the
probability of restore requests over time, along with the nature and criticality of restores, it is
possible to determine the optimal strategy for backup granularity and retention periods.
The Cloning and Vaulting of datasets goes hand-in-hand with retention as this is essentially
what turns an “operational” backup into an “archival” one. Cloning (aka. – twinning, when done
during the backup process) is used to produce duplicate sets of tapes, which then can be vaulted
either in a secure location onsite or shipped offsite. This shifts their use to a more “disaster
recovery” focus.
Offsite vaulting generally serves two purposes: first, to provide business continuance
capabilities. If the primary location suffers a disaster, then the copies of the data from the offsite
vault are used for recovery. Second, to maintain an archive of data requiring extended retention
for legal, governmental, and other business requirements.
One advantage of the cloning approach (compared to rotation) is that restores requiring older
versions of the backup do not have to be obtained from the offsite location (unless there is a
media failure), reducing the time to get the data back in operation.
Copyright ©2007 EMC Corporation. Do not Copy - All Rights Reserved.
Backup and Recovery Fundamentals - 39
©2007 EMC Corporation. All rights reserved. Backup and Recovery Fundamentals - 39
Retention Periods
Once a company has decided to vault its data, they need to determine for how long the data
needs to be retained. There is no magic number that can be applied to all environments. Key
factors that influence the number of backup copies that a location maintains, and how long they
are kept, are legal requirements, government requirements, and business requirements.
Legal Requirements
A corporate legal counsel may suggest that certain data be kept for specific periods in case this
data is needed for legal proceedings. Examples might be engineering records that could be
useful in litigation involving protection of intellectual property or protection against liability
suits.
Government Regulations
Government regulations require that some information be available for a specific number of
years. An example is corporate financial data. Some governments require or suggest that this
data be kept accessible for a set period. For instance, in the United States, seven-year retention
of key financial records is common.
Business Requirements
Some businesses may require extended retention of data to maintain the business. One example
is the medical industry, such as hospitals, where patient histories are used to aid patient
treatment.
Copyright ©2007 EMC Corporation. Do not Copy - All Rights Reserved.
Backup and Recovery Fundamentals - 40
©2007 EMC Corporation. All rights reserved. Backup and Recovery Fundamentals - 40
Backup and Recovery Capacity and Performance
Considerations
Data movement
CPU
Memory
Paths and I/O bandwidth
Network bandwidth
In considering performance of a backup/recovery solution there are several points to keep in
mind:
Data Movement
If the length of the window and the amount of data that must be moved is known, then the
required data movement rate can be estimated. Knowing the required rate in gigabytes per hour
is useful, but since most devices and transport mechanisms are rated in megabytes per second,
both scales should be considered.
CPU
Backups can require significant CPU resources. For example, one server vendor suggests a rule
of thumb of 5 MHz of processor power for every megabyte per second of data that needs to be
moved. A direct-attached backup requires two data movements from the server. A LAN backup
requires four data movements, two on the backup client and two on the backup server. Direct-
attached backups require 10 MHz per MB on the backup client. LAN backups require 10 MHz
per MB on the backup client, and 10 MHz per MB on the backup server. All of these are merit
considerations when deciding on a solution.
Copyright ©2007 EMC Corporation. Do not Copy - All Rights Reserved.
Backup and Recovery Fundamentals - 41
©2007 EMC Corporation. All rights reserved. Backup and Recovery Fundamentals - 41
Staging
Writes backup to disk cache
Improve the performance of backups
Shortening the backup window
The Staging process is driven by:
– As part of an automatic process
– As part of an event driven process
– As part of an administrator initiated process
Staging is a process of transferring data from one storage medium to another. Staging reduces
the time it takes to complete a backup by directing the initial backup to a high performance file
type device. The data can then be staged to a storage medium, freeing up the disk space. For
example, when staging a backup, administrators first copy the target data onto the disk cache
and, later, move the backup image to tape according to the established disk staging schedule.
Disk staging enables administrators to complete backups faster, shortening the backupwindow,
and thereby affecting business applications less than adirect backup-to-tape method.
Different backup software vendors implement different features to the staging process. Usually,
the staging process is started by one of the following conditions:
An automatic process, such as keeping the save set for 30 days on the staging device before
staging the data to the next device.
An event-driven process, such as when available space in the staging pool drops below a set
threshold. When this happens, the oldest save sets are moved first, until available space
reaches the upper threshold that has been set.
An administrator-initiated process, such as allowing the administrator to either reset the
threshold and kick off staging or manually select save sets to stage.
Copyright ©2007 EMC Corporation. Do not Copy - All Rights Reserved.
Backup and Recovery Fundamentals - 42
©2007 EMC Corporation. All rights reserved. Backup and Recovery Fundamentals - 42
NDMP
NDMP is an open network protocol that defines common
functional interfaces used for these data flows
NDMP meets the strategic need to:
– Centrally manage
– Control distributed data
– Minimize network traffic
NDMP separates the data path and the control path, so
network data can be backed up locally, yet managed from
a central location
Network Data Management Protocol, NDMP, is a protocol pioneered by Intelliguard and
Network Appliance that defines a common architecture for the way heterogeneous file servers
on a network are backed up.
The protocol allows the creation of a common agent used by the central backup application to
backup different file servers running different platforms and platform versions.
With NDMP, network congestion is minimized because the data path and control path are
separated. Backup can occur locally, from file servers direct to tape drives, while management
can occur from a central location.
NDMP is an open standard protocol promoted and supported by server vendors, backup software
vendors, and backup device vendors.
Copyright ©2007 EMC Corporation. Do not Copy - All Rights Reserved.
Backup and Recovery Fundamentals - 43
©2007 EMC Corporation. All rights reserved. Backup and Recovery Fundamentals - 43
Cross-Vendor Terms Chart
Common
Terms
EMC
NetWorker
Veritas
NetBackup
Veritas
BackupExec
IBM TSM
HP Data
Protector
(OmniBack)
Backup Server Server Master Server
Media Server /
BackupExec
Engine
Server
Client
Storage Agent
File Index
Media Database
File Space
Migration
Cloning Cloning
Duplicate or
Inline Copy
N/A Reclamation
Object Copy
Session
Cell Manager
Backup Client Client Client
Workstation /
Server Agent
Client System
Storage Node Storage Node
Media
Server
N/A Media Agent
Client File Index Catalog
Media Database
Volume
Database
Data Set Save Set Backup Image Backup Set Backup Session
Staging Staging Disk Staging N/A Disk Staging
Catalog Internal Database Backup Catalog
The chart shown relates the backup terms used across several vendors. Please take a moment to
review them.
Copyright ©2007 EMC Corporation. Do not Copy - All Rights Reserved.
Backup and Recovery Fundamentals - 44
©2007 EMC Corporation. All rights reserved. Backup and Recovery Fundamentals - 44
Module Summary
Key points covered in this module:
Recovery Time Objective (RTO)
Recovery Point Objective (RPO)
Backup Data and Business Data
Compression
Retention Periods
Network Data Management Protocol (NDMP)
These are the key points covered in this module. Please take a moment to review them
Copyright ©2007 EMC Corporation. Do not Copy - All Rights Reserved.
Backup and Recovery Fundamentals - 45
©2007 EMC Corporation. All rights reserved. Backup and Recovery Fundamentals - 45
Course Summary
Key points covered in this course:
Basic Backup procedures and terminology
Backup types
Generic Backup Architecture
Backup Granularity and levels
These are the key points covered in this training. Please take a moment to review them.
This concludes the training. In order to receive credit for this course, please proceed to the
Course Completion slide to update your transcript and access theassessment.

Sponsor Documents

Or use your account on DocShare.tips

Hide

Forgot your password?

Or register your new account on DocShare.tips

Hide

Lost your password? Please enter your email address. You will receive a link to create a new password.

Back to log-in

Close