This post is based upon personal experiences engineering the deployment and operation of an an Enterprise Backup solution using Microsoft System Center Data Protection Manager 2012 (DPM). This was a migration effort that took place from July, 2012 until November, 2012.
The project consisted of replacing the HP Data Protector v6.x Enterprise Backup application with System Center DPM 2012 with little or no interruption in daily backup operations.
DPM Product Overview
System Center DPM 2012 is the backup solution incorporated into System center 2012. Prior to 2012, all components, Orchestrator, Configuration Manager, Service Manager, Operations Manager, Virtual Machine Manger and Data Protection Manager were each separately licensed. The 2012 System Center Suite bundles all components into one and the user is left to implement all or any of the components within their organization. See the Microsoft Licensing web site for details.
The DPM product consists of the backup management software and Management Console installed on a Windows 2008R2 platform. Remote Management is possible through System Center Operations Manager Console. System Center DPM requires a SQL Server 2008 / 2008R2 / 2012 database. The database may be installed locally on the DPM server using a built-in SQL license or it may connect to an external SQL server. In this case, the license for SQL is the one used on the external database server. The System Center DPM agent software for servers and client computers is installed on each machine and is the means by which backup data is transferred to the DPM server. In order to store data from client machines, the DPM server must be directly connected to online disk and/or tape storage. Generally speaking, disk storage is used to short-term retention while tape is intended to be used for long-term storage.
System Center DPM uses a SQL2008R2 database located on the C: drive along with the product binary code. DPM communicates with servers using a local agent to transfer data to the server, where DPM writes the stream to tape. The agent is cluster-aware so it will automatically follow the active node. The agent installation program detects if a server is part of a cluster and will offer to install the agent on all nodes in the same operational request.
Backups are organized around Protection Groups. A Protection Group contains an entry consisting of a server and volume(s). Protection Groups use Short and Long term recovery strategies to determine how long a tape is kept active before it is overwritten.
You can get more information from the Microsoft System Center web site and/or ask Mr. Bing and Mr. Google.
HP Data Protector (OmniBack) Product Overview
HP Data Protector is based on the OmniBack Network Backup System which HP acquired when it purchases Apollo Computer in 1989. OmniBack is Unix-based and uses agents installed on client computers which is similar to the System Center DPM model. Omniback was ported over to Windows in order to have a broader appeal in the market and to provide a GUI interface but under wraps it is a Unix application running under a set of Windows Services. Like all backup solutions, the details of each backup, for example, file name, size, dates, media used, etc., as well as product configuration, media, drive(s), schedule, etc., are maintained in a series of text files that comprise the “database.” Any corruption in the text files results in a loss of referential integrity between the “database” and the media on which it is written. OmniBack offers to back up the database but then again you need OmniBack to restore it.
I’ve run into a corrupt database situation and was fortunate enough to retain the topology of the environment but loosing parts of the database that recorded the details of the backup. If you have no record of the backup, the software can’t find the appropriate tape to use to retrieve it. It is possible to catalog tapes and in the process, OmniBack will update the database with the details found while examing the tapes. This is extremely slow and with two 96 slot libraries, proved unsuccessful. Good bye recovery.
Warning. When using any tape-based storage system, both System Center DPM and OmniBack use all the tapes that are available to write to concurrently. Backups are piled on backups until tapes are filled up. You can see that if the tape gets messed up or the database is corrupted, the backup chain is lost.
The Backup Infrastructure
Both OmniBack and System Center DPM use the same hardware:
- HP Blade server, Proliant BL 460c G1
- 2 Intel, Xenon E450 3GHZ 4 Core processors,
- 16 GB memory.
- HP MSL8096 4 LTO-4 3280 FC Tape Library – 96 slots
- HP MSL8096 4 LTO-5 3280 FC Tape Library – 96 slots
The Migration Plan
The objective, again, is to transition the backup application from OmniBack to System Center DPM running parallel systems on two blade servers each one connected to one of the tape libraries. Two types of backups are involved using OmniBack: data files residing on clustered or distributed file system published shares and local direct attached storage on individual servers along with the System State.
The plan is to deploy an additional HP blade server and attach it to the LTO-5 tape library. The LTO-4 library remains on the OmbiBack server. The jobs are split between the two systems. Omniback will continue backing up files and the server attached storage. The retention period will be shortened to accommodate all backups on the single library.
Protection Groups are created on the DPM server to backup data and individual server direct attached storage to the LTO-5 tape library, again with a short retention period. Each server will run their respective backup jobs on alternating days of the week.
When sufficient testing of the restore operation using DPM is complete then, the LTO-4 library will be connected to the DPM server, tapes re-formatted and jobs re-configured to use LTO-5 for data file backup and LTO-4 for Server direct attached storage.
After ascertaining the tape storage requirements for a full set of traditional weekly full backups and daily incremental backups, the number of retention points are adjusted accordingly. Off-site tape rotation nor replication to another DR site are not used. Instead, tapes are continually over-written and remain on site.
Backups are centered around the concept of a Protection Group(PG). DPM uses a wizard to create and maintain a PG. A Protection Group consists of a number of jobs. A job is comprised of a unit of storage (drive, share, folder) attached to a computer with a DPM agent installed. So the backup of a C: drive on server-A is a job; likewise, a backup of the cluster-a\share-A is also a job. A PG is executed on a defined schedule and uses resources ( tape drives or disk Storage Pools) assigned to it. Retention points are specified depending on whether the strategy is short-term or long-term. Long-term storage using a full weekly backup and daily incremental backup set is used in this instance.
A word about Bare Metal Recovery. New to Windows 2008R2 is the concept of Bare Metal Recovery (BMR). BMR uses Windows Backup to create a VHD file of the entire active partition of the server. The idea behind BMR is to recover a server intact and ready to run from the point in time backup. This is similar to using a product targeted toward PCs like Ghost or Acronis to restore a system. With BMR, you basically boot from DVD and choose the BMR option. The details are quite interesting but outside the scope of this post.
Protection Groups created for BMR backups of Windows 2008 servers use DPM Disk Pools for on-line storage. The Disk Pool is a 2TB LUN presented to the server. The LUN must not be formatted not initialized. When creating a Storage Pool, DPM makes all of the settings using GUIDS to represent the files in a job stored in the Disk Pool. As the name implied, multiple LUNs or drives can be added to a Disk Pool.
The running schedule consists of Protection Groups backing up file storage, Windows 2003 computer C: D:, etc. drives and System State to tape and Windows 2008 servers using BMR to Disk Pools.
Recovery operations are the true test of a backup solution. System Center DPM uses a calendar view to select objects to be restored. The restored object can replace the original object or be restored to an alternate location. File security can be preserved or the content can inherit the security of the target location. Not much new there.
Reporting is through SQL Server Reporting Services using a rudimentary set of available reports. The objective is really to see where your tapes are when using off-site storage or if a protection group is available for a particular day. If one knows the DPM DB Schema, one can create custom reports. Ask Mr. Bing and Mr. Google for help in this area as there are commercially available reporting packs. The alternative, is to locate some power shell scripts from the System Center DPM team Blog located at:
Who protects DPM?
The interesting question comes up – “How do you backup your System Center DPM installation?” I used a combination of a SQL server backup command and Windows Backup. Here’s how to do it.
- Use SQL Server Management Studio to script a database backup.
- Create a batch file that uses the SQL command shell to execute the script you created.
- Use the Windows Backup command to create a BMR backup.
- Add these steps to the batch file.
- Use Task Scheduler to schedule the recurring task.
Make sure that the batch file outputs (SQL Server backup and Windows Backup) are directed to an external network share.
When things break, use the Windows backup to restore the server. If you use a local database then its on the BMR backup. If not, here’s where the SQL backup comes into play. Of course you can always use the SQL backup to restore a corrupted database.
This post focused on describing a migration from OmniBack to System Center DPM 2012. During the course of migration, numerous little problems arose which required a “do over” and searches through the DPM blog and postings on the Internet. Once the environment stabilizes it tends to run on automatic pilot. The largest source of interruptions happen when the agents loose communication with the DPM server through network problems or client machine reboots. Those failures are usually remedied when examining the DPM Alert by rerunning the operation. If its tape, the job will mount an existing tape set and continue writing to it.
Once you have your own experiences under your belt, check into the forums and see what people have gotten themselves into – amazing.This posting is provided “as is” with no warranties, guaranties or any rights whatsoever. All content is based on the author’s experiences and opinions and is not intended to influence the actions of the reader.