Back up And Archiving ....ideas

Curious to hear what others use...
I am supposed to find some kind of external HD to back up on to along with Archiving to tape.
Currently were are using an old version of Retrospect to Travan tapes. Our daily back up is spilling on to 2 tapes and I can't get a whole reliable back up for when our ship goes down.

Any help appreciated

APRIl
 
Re: Back up And Archiving ....ideas

Archive:
Retrospect to VXA X6 tapes for long term storage in OS X. One tape holds about 40GB uncompressed and it takes about hour and a half to write/verify. I write 2 tapes in case one of them is bad. Take a look how they test tapes here:

http://exabyte.com/technology/tested/index.cfm
I wouldn't trust any DDS, DLT, AIT or LTO tapes as I had bad unreadable tapes before. Since I switched to VXA 3 years ago. Never had a bad tape.

Backup:
For immediate backup I have exact clone of my File server running OS X Server that synchronize itself with my main server each night also using Retrospect. Each server has 6 320GB FireWire HD for job data and 500GB FW drive for ripped pages.

Have fun
 
Re: Back up And Archiving ....ideas

If looking at an external hard drive, I would look at the Seagate FreeAgent drives. I have two, and they are the only external hard drives I've found with a 5-year warranty.

Then it's just a matter of moving jobs to the backup external hard drive. Every time I get done with a job, I copy it to an external hard drive and a server share that gets archived to tapes. I don't depend on the tapes myself and rather like knowing I have my own external hard drive if the server and it's shares go down.

Don
 
Re: Back up And Archiving ....ideas

I've just gotten off using Retrospect myself, and believe me, you really can't take for granted that everything backed up and is restorable. Restore it on your new Seagate external HD just so all these old archives aren't stuck on just one backup medium.

And as far as the drives I've mentioned, I have two, so I can take one off-site if I choose and then bring back on-site to synchronize with the first external HD once a week.

Don
 
Re: Back up And Archiving ....ideas

We do long term backup to two sets of LTO2 Tapes that are moved off-site. These tapes are never recycled/re-used.

We do short term backup to two different sets of hard drives (500 GB SATA2) in an eSATA enclosure. The sets alternate every other week and are recycled/re-used every other week. This gives us a minimum of 1 weeks worth of backups that are rapidly accessible.

We archive to DVD-R discs. We burn two copies of each disc. One copy is taken offsite for long term storage. The other copy is placed in DVD jukeboxes that are on a dedicated server. The jukeboxes are network accessible and appear like a large hard drive/volume to the end user. Each DVD is simply a folder in the volume. The volumes can be mounted from Mac OS X or Windows. The volumes are cataloged whenever new DVD-R discs are added.

We will switch from DVD-R to BluRay or HD-DVD as soon as either format wins.

Abe Hayhurst
Director of Color and Technology
We Do Graphics, Inc.
 
Re: Back up And Archiving ....ideas

Abe, What is the brand of DVD jukebox your using?

Thanks,

--tom
 
Re: Back up And Archiving ....ideas

How many dvds do you have already in the juicebox and how much each gb dvd holds?
 
I would like to be able to archive to blu-ray using some kind of catalog system AND be able to put that disk in another computer and see the files, copy to the desktop and use (if i had to). Is there anything like this? We have retrospect plus 80 travan archive tapes all cataloged. Problem is I can't find another compatible travan drive for Mac. I fear the one we have is snapping tapes we have. I would like to be able to retrieve these files often. i don't want to keep using these catalog type programs if I can't keep up with technology every 3-5 years. What if i can't find another reliable blu-ray reader by then? Some suggestions were CatFinder and CD Finder. Any experience here?

Thanks
 
CD Finder WORKS!!

CD Finder WORKS!!

We have been using CD Finder for about 8 years and have catalogued all of our jobs burnt to cd. CD finder is really great for this as it is like the "find" on your Mac. Punch in a job number or name and it will tell you the cd it was burnt on. All jobs when finished are put into an "archive" folder then burned on a cd or DVD now, then tell CD Finder to catalogue it which takes a few seconds and your done. We numbered all cd's when we burn them and file them away on a bookshelf in numerical order. I've got every customer file for the last 8 years. We should start to convert all of these discs to DVD's now and continue this system making a spare backup of each for storage off-site (which we never did before.) its simple and works well for us. :eek:
 
Last edited:
After using Tapes for Years and Years - I sacked them off after a Tape faile don me when I needed it.

Now I use Retrospect & back up my entire Networked Servers to 2x 2TB external HD... These are connected via Firewire 800 - and Mannn its fast.... Its also VERY fast to retrieve even if the files are huge... I backup Everynight and swap the two disks - one is kept in the Fire Safe.

Archiving - We Archive to DVD's 2 x copies One on Site One Off Site.... Catalogued in CD Finder - nice easy App... Works for us....

Although - ABE - How many DVD's does your jukebox hold???

Cheers

Pete
 
April,
I think your on the right path with external HD storage...forget tape and retrospect (more on that later)
Check out this simple storage device that's reliable, simple and uses inexpensive SATA drives- you choose the size of the disks and you can constantly upgrade the raid:
Data Robotics, Inc.

I don't believe in tape for a few reasons- it's slow, and you place too many eggs in one basket to retrieve data from "one" possible disk- they fail. A striped raid with parity is intended to combat this...drobo is the best tool. Also, purchasing large tape drives can be costly.

Retrospect is ok, but archiving the data into proprietary "catalogs" is annoying...
I suggest using BRU server
BRU Server Product Information - TOLIS Group, Inc. - The Backup and Recovery Experts
Its able to backup your server, other clients, active machine email, etc.
http://www.tolisgroup.com/docs/whitepapers/TheBRUAdvantage.pdf

good luck
 
I just finished setting up a RAID file server. Got a 16 port RAID controller for less than $700, a case that has 20 hot-swap style drive bays with trays for less than $400, and a cheap motherboard, CPU and power supply. After doing the math, hard drives were cheaper than any kind of optical or tape media. The RAID controller is a Promise Supertrak EX16350, which supports RAID6. There are currently 7 drives in use - one is a "hot spare." The 6 active drives are storing as much data as 4, with the ability to lose any two drives simultaneously. If one of the drives fails at 2 AM on a Sunday, the array will be automatically restored using the spare in a few hours, and the data is still safe even if another drive fails during rebuild.

The advantage to this over tapes/DVD's is that all of the files are always available, and there is no back-up procedure. When we run out of room, we just get another 1 terabyte drive and add it to the array, which can automatically expand.

The disadvantage is that you can accidentally delete files. I set it up with OpenSUSE (Linux) for the OS, which has better (and simpler) controls to disallow deletion/modification of inactive files. I can also just make the single large volume bigger instead of creating new partitions, which I'm not sure you can do with Windows.

I'll be setting up a clone of this one at our other site, and I plan to use an automated process to synchronize the data, so we'll be safe even in the event of fire or theft.
 
I just finished setting up a RAID file server. Got a 16 port RAID controller for less than $700, a case that has 20 hot-swap style drive bays with trays for less than $400, and a cheap motherboard, CPU and power supply. After doing the math, hard drives were cheaper than any kind of optical or tape media. The RAID controller is a Promise Supertrak EX16350, which supports RAID6. There are currently 7 drives in use - one is a "hot spare." The 6 active drives are storing as much data as 4, with the ability to lose any two drives simultaneously. If one of the drives fails at 2 AM on a Sunday, the array will be automatically restored using the spare in a few hours, and the data is still safe even if another drive fails during rebuild.

The advantage to this over tapes/DVD's is that all of the files are always available, and there is no back-up procedure. When we run out of room, we just get another 1 terabyte drive and add it to the array, which can automatically expand.

The disadvantage is that you can accidentally delete files. I set it up with OpenSUSE (Linux) for the OS, which has better (and simpler) controls to disallow deletion/modification of inactive files. I can also just make the single large volume bigger instead of creating new partitions, which I'm not sure you can do with Windows.

I'll be setting up a clone of this one at our other site, and I plan to use an automated process to synchronize the data, so we'll be safe even in the event of fire or theft.

It sounds like you have things pretty well handled Kyle;)

The only thing I might have some concerns about going forward would be increasing search times commensurate with the growth of the file system sizes?

I am cognizant of the possibility that retrieving old job data at your particular location may be a highly frequent request and necessitate maintaining a larger pool of on-line data.

I bet that you can write a "Cron-tab" script that would migrate your least requested data to a parallel set of hardware implemented along the lines similar to what you've described above? Thereby managing your file system sizes and content, yet keeping well in hand all your data? Your operator's would only perform a search for job data against the "Archive/Secondary server" when a search request against the primary server came up empty.

What do you think?

Regards
Otherthoughts
 
....The only thing I might have some concerns about going forward would be increasing search times commensurate with the growth of the file system sizes?

I am cognizant of the possibility that retrieving old job data at your particular location may be a highly frequent request and necessitate maintaining a larger pool of on-line data.

I bet that you can write a "Cron-tab" script that would migrate your least requested data to a parallel set of hardware implemented along the lines similar to what you've described above? Thereby managing your file system sizes and content, yet keeping well in hand all your data? Your operator's would only perform a search for job data against the "Archive/Secondary server" when a search request against the primary server came up empty.

What do you think?....

Because of the way we have our file system set up, we don't really ever need to perform file searches. All of a job's files are nested within a single directory named with the job number and customer name. So we just need to navigate to the proper directory. The only rare exception is when we can't find something and suspect someone mistyped the job number. Our production management database can quickly tell us all of a customer's past job numbers, so we don't have to search the file system to find them.

In-progress and recent files will be stored within one directory, and everything older in another (since we have tens of thousands of archived jobs, the archives will be divided by range in an extra directory level). The older stuff will be read-only, so nothing can be deleted, added or changed accidentally.

I am much more confident with this system than the tapes we have used in the past, because if one of the drives fails, we will know immediately, and need only replace it before two more fail. With tapes, we don't know one of them is screwed up until we try to get data from it, and even if we knew, there is no redundancy so it wouldn't matter.

If you're referring to hard drive seek time and not file system search time, I expect to see that improve as the array grows, because more physical drives means that a given amount of data is spread out across more drives and is therefore "wider" and less "deep."
 
Last edited:
Because of the way we have our file system set up, we don't really ever need to perform file searches. All of a job's files are nested within a single directory named with the job number and customer name. So we just need to navigate to the proper directory. The only rare exception is when we can't find something and suspect someone mistyped the job number. Our production management database can quickly tell us all of a customer's past job numbers, so we don't have to search the file system to find them.

In-progress and recent files will be stored within one directory, and everything older in another (since we have tens of thousands of archived jobs, the archives will be divided by range in an extra directory level). The older stuff will be read-only, so nothing can be deleted, added or changed accidentally.

I am much more confident with this system than the tapes we have used in the past, because if one of the drives fails, we will know immediately, and need only replace it before two more fail. With tapes, we don't know one of them is screwed up until we try to get data from it, and even if we knew, there is no redundancy so it wouldn't matter.

If you're referring to hard drive seek time and not file system search time, I expect to see that improve as the array grows, because more physical drives means that a given amount of data is spread out across more drives and is therefore "wider" and less "deep."

Kyle,

Perhaps what I meant to suggest is that a single gigantic file system housing both In-progress and Older Archived data might become cumbersome with time and increasing File-System size? Despite your knowing the exact Job Number and Customer names to navigate to?

I am not quite sure I am following you accurately with respect to Archived Jobs? By Archived Jobs, If I am following you correctly, the differences between the Archived and In-Progress Jobs is their Read-Only status and their relocation to a Chronologically Stratified and altogether separate folder hierarchy from that of your In-Progress Jobs but still on the same File-System(s)? I guess what I really want to be clear on is that both the Archives and In-Progress Jobs exist on the same File-Systems?

Regarding seek-times and RAID arrays. The data is indeed written across all your drives "Wider and Less Deep" as you suggested. Notwithstanding the parity data that is created to insure that you would be able to tolerate losing two drives simultaneously, therein increases the servers maximum sustained read and write speeds. This along with the Disk Caches aboard each disk present in the RAID array allows your server to assemble and respond to a file-system/directory-structure request from a client computer increasingly faster as you add more disks to the RAID Array. So far, so good! I believe we both agree so far?

A Directory-Structure/File-System-Index for a given amount of data however does not change in data-length, whether the data exists on one individual disk or on a RAID array with six disks acting as one from a client's point of view. The list of the data housed on the file-system is the same in either case.

So over time, the amount of data passed over your network will grow larger for every Directory-Structure/File-System-Index request from your client computers. In other words, the larger your Directory-Structure/File-System-Index is, the larger the burden you will place on your network and client computers despite how robust your server is.

In summary, I like what you have implemented Kyle! I just wanted to share my thoughts about very large file-systems. Obviously things are working fine for you today with tens of thousands of jobs, well done! I guess this is mostly food-for-thought;)

Regarding Tape, I'm with you, Tape is troublesome and slow!

Best Regards
Otherthoughts
 
After a job has been shipped out the door. We archive the job folder onto DVD, used to be CD's before that. We also burn 2 copies and keep one set off site. We have over 5,000 disks CD's and DVD's and I can't imagine having all of them on a RAID as that would be massive not to mention you don't have an off site backup.
 
otherthoughts:

The active and archived jobs are indeed stored on the same file system. The large logical volume (currently 4 terabytes using a hard drive manufacturers' definition of a terabyte) that is provided to the operating system by the RAID controller has a single partition on it that occupies the entire volume. This partition has, in the root of its file system, a directory for active jobs, and also a directory for archives. From a network client, a single samba share is mounted, and because the root of the file system is shared, the client sees a "jobs" directory, an "archive" directory, and several others.

On most file systems, each directory has its own index, which lists files and directories inside it and points to their locations. When you are navigating the directory tree, each directory points to the next. Some file systems may have a master index, but it is the linked directory indexes that are sequentially accessed as the directory hierarchy is traversed.

As an analogy, consider a printed volume of the US Internal Revenue Code. But instead of the clauses being logically arranged in order by chapter then part, section, etc., they are arranged in a seemingly random order, with every clause appended to the end in the order that they were created, and divided among multiple pages when they are amended with more words. This is how file systems typically grow, because an insertion that leaves everything neatly ordered would require a prohibitively expensive shift of all the data that occurs after the insertion. This isn't a perfect analogy, because with file systems there is such a thing as "deletion." If you had an enormous table of contents at the beginning that listed all of the clauses in the apparently random order, with references to what page they were on, then the worst-case scenario is that you would have to read the entire table to find a single clause when you knew what chapter, part, etc. it was in. Instead, imagine that the first page is a table of contents that lists only the chapters with page numbers. Knowing the full path of the clause in question, you would quickly find it, even though the list is not in any logical order. You would then go to that page, where you would find a table for all of the parts of that chapter. You would then go to the page for the part, then section, sub-section, etc. until you found the page with the clause you're looking for. This is how most common file systems work, and is the reason why many file systems have a limit to the number of items nested within a single directory (because a fixed amount of space is allocated for the directory's index). You can therefore increase the number of files and directories by a power of two (e.g., from 1,000 to 1,000,000) while only doubling the time it takes for the operating system to find a single file or directory. In a well-divided system this could mean the difference between one hundredth of a second and two hundredths of a second. The operating system, network clients, etc., never have a clue what's in a given directory unless they specifically list its contents - it may as well be unused space.

If you have OS X, you can see evidence of this behavior by comparing the time it takes to do each of the following:

1. navigate to /Library/Printers/PPDs/Contents/Resorces/en.lproj.
2. Open a finder window an go to the root of the system drive. It should list the directories Applications, Library, System and Users. Change the view mode to list view if it isn't already (command-2). Alt-click on the disclosure arrow to the left of the Library directory. It should now list the entire contents of the Library directory and all of its nested directories.

If there was a single master index that had to be read in its entirety for every listing of a directory, the second operation would be quicker than the first (only one read instead of many). The first operation should in fact be faster because you are taking one focused path through the directory tree, while the second operation has to bounce up and down touching every branch at every level of the tree.


pcmodem:

I couldn't imagine having several thousands of optical disks - that sounds massive! There will be an off-site backup as soon as the clone is installed at our other facility, so our only single point of failure will be a comet.
 
Great job in explaining things Kyle!

As long as the root directory doesn't contain a colossal folder and/or file list and the subdirectories a network client might navigate to thereafter, your network would never be burdened by a network client request for the directory listing and of course the network client would not be burdened either. My Bad!

Thanks for reminding me about inodes, MFTs, FAT32, HFS and other things I haven't pondered in a while;)

My Hats off to you! The only question I still ponder is how you structure your directory hierarchy containing tens of thousands of jobs so that you avoid such bulky directories?

I suspect you'll have an equally well thought out design comparable to what you have shared with us so far;)

Thanks for taking the time out to respond!
Otherthoughts
 
Whatever your backup method, just make sure you have multiple redundant copies of your data on reliable media. Backup media will fail at times. One primary thing to backup is who owes you money. I backup all computers(employee created data only) to a backup server(any network accessible computer with a big HD) and from there to an external USB drive every night. I have 15 currently in use. Each USB drive is a full backup as of that night. All backups are incremental, that is only the data that is new or has changed is copied at each stage. This backup system is cumulative and you will have to purge it of unneeded data occasionally. I buy new USB drives (now, only 2.5'') periodically when they are on sale at my favorite discount store. A couple thousand invested here for your business is well worth it. You will hug and kiss them profusely when you need them. ;-)
I also keep a primary business server all setup and ready to go, just not plugged in.
Best.
 

PressWise

A 30-day Fix for Managed Chaos

As any print professional knows, printing can be managed chaos. Software that solves multiple problems and provides measurable and monetizable value has a direct impact on the bottom-line.

“We reduced order entry costs by about 40%.” Significant savings in a shop that turns about 500 jobs a month.


Learn how…….

   
Back
Top