Skip to main content

Need to recover data from corrupt backup file

Thread needs solution

Hello Everyone,

Here's a tale of woe with a moral attached to it...

I have two PCs running XP Pro SP3, each with perhaps 200GB of data (including five years worth of keen digital photography for my partner and I!) I alwas backed up to tape in the past , but my backups were spanning several DDS cartridges and this was a bit laborious to do by hand each week. So, in January 2008 I invested in Acronis True Image Echo Workstation (ATIEW), and purchased a Netgear ReadyNAS Duo (great piece of kit by the way) with mirrored 500GB drives. I set ATIEW to perform a full backup of each system weekly with nightly incremental backups. Since space is limited I allowed each full backup to overwrite its predecessor.

All was going well (checked the logs now and again, every backup reported successful) until following a hard drive failure on one machine I lost the partition containing a load of work. No worries, I've got everything backed up... Wrong!

As you'll have guessed, on trying to restore the partition the backup was reported to be corrupt. I then tried to run "verify" against the file, reported corrupt again. I copied the backup to the local hard drive of my other PC and tried again. Still reported corrupt. I updated ATIEW to the current build. No difference. I then tried to verify from the Acronis boot media. Still reported corrupt.

After reading a few forum threads, I attempted to mount the backup. This did work!
I can read the directory structure and can copy (most) individual files. However, f I try to copy whole directories the copy is abandoned at the first file found to be corrupt. (I get a message about the faulty files being too large, presumably the problem is causing their size to be misreported to Windows.) I really don't want to try to copy each file one at a time - there are several thousand of them. So, is there a recovery tool available please? Even if some are lost forever I suspect many (most even?) might still be ok on the basis of the copies I have tried.

It then occured to me to validate the backups made for my other PC. Some of these were corrupt too. Of the eight or so .tib files on the NAS, four were corrupt. I find it very hard to believe that a very popular mirrored disk server will corrupt files, and also both my PCs pass CPU and memory tests, so I suspect ATIEW is actually storing garbage at backup time. This might be due to partition problems (though if so I don't see why ATIEW doesn't validate each partition it backs up on the fly - this would be trivial compared to say the processor cost of compression.) If ATIEW *is* sending correct data, doesn't Windows do read-after-write verification? Or is this not the case for network drives?

Anyway, the moral of this tale is: you *really* don't have a backup solution until you've done a trial restore...or at very least a validation. However much you spend on kit. Oh, and never directly overwrite the preceding backup with a new one. If the new one faiils, you've lost both. And heres a third... For neatness I scheduled each backup run to include all the partitions on my PCs (I have Windows and apps in one, user stuff in another, and softwar distributions/downloads in a third...) In futur I'll seperate them so that a fail in one partition doesn't stop me restoring the others.

Thanks for staying with this story to the end.

Allan

0 Users found this helpful

Sorry about your problem and I can't help you with any solution to make your recovery easier.

Never assume that a PC does any error-checking on local data operations. Any data checking is done when you use the data (CRC on HD) or it jumps into la-la land because of bad RAM contents. I was under the impression that networking does do checking at least at the packet level when it arrives at the destination device but I also know from personal experience that it is not fool-proof when going to the very end of the process - the storage on the destination HD.

Did you do any TI Validations when creating your archive?

Your statement about having a backup solution bears repeating: Until you've done a trial restore you don't have a backup. This applies to every backup program ever written, not just TI.

For my photos and other data files, I don't wish to stuff them into a proprietary container file for the reason you've described. I use SyncBackSE, and there are others like Karens Replicator, etc, to store my files in their native format and file structure on the backup device. IMO, this makes it much more likely that a problem will not cause massive loss of files and make it easier to recover since it is just a Windows structure. These programs can also do compression, if necessary, incrementals, and do a read-check compare after writing.

I'm with Seekforever in backing up data files in their native format (in my case with Karen's Rep.). I use True Image *ONLY* for whole drive backups. I know it requires extra drive space, but large hard drives are inexpensive enough nowadays.

Thank you Seekforever and DwnNDrty for your advice. I guess I'd sort of imagined the technology for secure backup was pretty mature by now! I'll look into the utilities each of you mention...

Sadly I'd never tried the verification tool until too late. Oddly enough I don't recall any direction to do so in the Acronis documentation, I'll reread that tonight!

While thinking about Seekforever's response "Never assume that a PC does any error-checking on local data operations" I recalled that we used to have a VERIFY command in MS-DOS which enabled read-after-write verification of all disk writes. Although this command is still accepted at the XP command line it doesn't actually do anything - Microsoft included it only for backwards compatibility. So surely there must be something in XP that performs this function? I cannot imagine why ATIEW apparently continues a backup job (and reports success in the logfile) should this process fail! Or doesn't it? In any case I'd have expected disk write fails to be listed in the Windows Event Log at the time of the backup.

Thinking further about the possibility that if ATIEW backs up a corrupt partition then it does so without a murmur... it would be at least a start if we could reconstruct the corrupt partition. Then we could use standard partition repair tools on the restored backup.

Googling "corrupt .tib files" I found a data recovery company who will recover from .tib archives. I wonder how they do this? Wouldn't they need a recovery tool? And if so, maybe the Acronis .tib format is specified somewhere? I'm a database programmer but I'd willingly dust off my "C" to write an open source recovery tool...

Allan.

No, they just tell you it can do a validation and that's about it. What's worse is that they don't strongly tell you that when you do a validation of an image that it should be done using the TI rescue CD which is Linux based. People validate in Windows and it works and they think life is beautiful. However, when the active partition needs to be restored the TI Linux CD must be used and a weakness in Linux is a lack of drivers for all the PC hardware configurations. Many people in dire need of a restore have come unstuck on this point. Once you successfully validate using the TI CD then you can safely validate using Windows unless you change the HW.

Yes, there are applications and possibly a command here or there that will do verify on data transfers but there is nothing inherent in the PC or basic OS that does it. A read-check on every write would slow a PC considerably and the likelihood of a failure in normal operation is really quite low unless a HD, for example, is getting quite old. Unless you are running servers you don't see RAM with ECC or even a single parity bit these days while parity was fairly common 15-20 yrs ago.

The only disk write fails reported by the OS are the infamous "delayed write" failures which I think have more to do with unavailable drives and data still in a cache somewhere or a drive going off-line or getting full - not an actual failure of the data on the platters. Data on bad sectors is reported on the read via the CRC mechanism, not when it was written. I have seen code in disk drivers (not PC) for turning on a read-check after writing but it was purlely there for diagnostic purposes. Note that this concerns regular PCs running regular PC apps. Somebody could write an app that goes out and verifies its writes and this of course is what happens with CD/DVD burning programs, data backup programs that offer the feature, etc.

The TI image corrupt message really means that TI cannot open the archive file, read the contents into RAM and successfully recreate every one of the 4000 checksums per gigabyte. If only 1 checksum is bad the entire archive is declared corrupt. When files are extracted from the archive, I believe that only the checksums pertaining to the blocks the files are located in are checked which is why it is possible to recover files from a "corrupt"image.

It might be interesting to take a tib file and validate it on the PC. Then do a Windows copy to your NAS and then revalidate it. I would do this with a large file since transferring smaller ones is not the same thing. If you don't want to use a tib file you can take any file and using a free checksum calculator do the same thing. Get checksum, copy file to NAS, redo checksum, copy file back to PC and redo checksum again.

As Seekforever mentions, what might work for your image is to somehow transfer it to an internal drive in the PC, it may suddenly become available to be restored from there.

If the network gets overloaded due to throughput bandwidth, TI sometimes just gives up or the network does. This often causes TI to come up with the corrupt tib file message.

From the rescue disk side of things this is even more likely with the Linux drivers used by Acronis.

Thanks to both of you for your comments.

I like Seekforever's idea of rechecking chechsums after copying a (known good) archive in each direction between the NAS and a local drive. I will try this tonight. There are definite XP issues with copying large files (the dreaded "out of resources" error which is frequently described in various forums) and when I initially tried to move the archive onto a local drive (as per bodgy's idea) I ran into this. I think I ended up using robocopy (or one of its gui derivatives) which avoided this. (I still cannot believe that Microsoft hasn't fixed this in a Service Pack, beta testing must have revealed it?!)

Unfortunately bodgy's idea of copying the .tib file from the NAS to a local drive did not allow me to successfully validate it.

I've given some thought to the design of a "quick and dirty" tool to recover what I can from a mounted archive. I'll probably use a shell script running on Cygwin, I'll post again once it is workiing in case it is of any use to anyone else.

Who would have thought that backup technologies bring with them so many problems?

Regards,

Allan.

re: Acronis True Image v8.0

While trying to restore a fairly large directory tree from a "plugged-in" backup image, the copy process craps out with:

"Cannot copy [filename]: Access is denied. Make sure the disk is not full or write-protected and that the file is not currently in use."

The error is presumably due to a bad sector on the backup disk, because none of those conditions obtains. But the ultimate problem seems to be that, when using "Explore Image" rather "Restore Image", True Image is relying on Microsoft's copy facility in Windows Explorer, which famously has little or no error recovery logic. (Whenever Windows Explorer hits a snag, it simply aborts the entire copy and emits this useless error message. One has no way of knowing which files have been copied and which have not, because Microsoft copies files in no discernable order. Obviously it must be using *some* algorithm, but I've never figured it out.)

The whole reason I bought True Image was to _AVOID_ Microsoft's fragile copy logic. I'd be happy to ignore the occasional bad file if I could get the rest of the good ones copied over.

Any suggestions for how to get around this?