Recovery Failure - Questions
I've had 2 recoveries fail.
In one instance we lost power entirely to an nfs datastore and the server. I assume some corruption took place. There were no backups going on at the time. All the VM's came back up except one which the operating system was corrupt. While I wasn't handling it at the time one of my co-workers could have run fsck and tried other things we thought reverting to the previous night's backup would be sufficient.
My understanding (which could be incorrect) is that the Acronis system copies the original disk image and then only copies snapshots over from that point forward. When you go to restore it will copy the snapshot back onto the host and mount the original vmdk disk with the snapshot.
What happened was the exact same corruption was there. So I assumed somehow the main vmdk was corrupt. Which doesn't make sense because generally it only writes to the snapshot from that point forward if there is one. Maybe it reconsolidates after that point. I don't know.
So I wanted to do a FULL recovery meaning it would copy the original drive and the snapshots back over and overwrite all of it. This didn't work either. Exact same corruption. This was very odd.
So I decided to mount the backup from the backup location. Same corruption.
So I'm scratching my head trying to get around how this was possible. Can someone explain to me the actual procedure and how this possibly could have happened?
(we were able to pull file level backups from another method we used to reconstruct the server quickly)
--
The other question is - If we manually do our own snapshots and consolidations will this impact the acronis backups in any way by fundamentally changing the structure of the original vmdk?

- Log in to post comments

Thanks for your reply. Just so you know both systems that were corrupted were linux (Centos based) not Windows.
I understand the corruption was inside the VM. But that corruption should have only taken place at the time of the power outage and only impacted the current state of the machine. It should not have impacted the backups. I understand you think that the corruption had to have been there beforehand and I suspected the same thing. But I'm positive this isn't the case for both times this happened. There should not be any corruption in the backups at all if they are being recovered from older backups. I tried going back weeks without luck using incremental and full restores.
Ok so to make sure I understand the backup process:
1) Acronis calls Vmware to snapshot the VM. When completed it copies the current disk in use over into Acronis' backup file/dir in its hard drive.
2) It then tells VMware to consolidate and delete the snapshot.
3) In your backup file on Acronis' drive it keeps a record of all the recovery points. The original files and the snapshots in its own way.
So in Acronis' disk it has 1 copy of the the main (original) vmdk drive and the snapshots?
So your system writes the main disk and the specific date of the desired recovery to a brand new VM to do a full recovery. That would mean the corruption had to have been in the backup as well. This doesn't make sense because I went back weeks and the VMs were running fine all that time. They had several reboots since the earliest backups and no corruption.
You described how it recovers into a new VM - but can you explain the process in how it recovers to an existing VM?
I'm going to need to start scheduling some validations. Something is fishy going on.
So by me doing our own snapshots manually in Vmware and reconsolidation will not impact the backup set (as long as the backups are not running at that time) in any way? Sometimes before I do OS updates I take a snapshot to make sure the updates don't break any of the software.
- Log in to post comments

Hi Steve,
Thank you for the additional details. The data is kept inside Acronis backups in form of volumes maps, i.e. there is a metadata describing the backed up volume properties (file system type, size, cluster size, offset, etc.) and the actual volume contents. There is no notion of .vmdks in the archive.
The recovery into existing VM tries to map the volumes from the backup onto existing volumes in the target machine automatically. If the volumes match each other then the existing volumes are overwritten with data without touching the volumes structure. If there is a difference in the layout (for example we detect that volume size in backup differs from the volume size in the target system) then the target volume gets deleted and a new one is created. After that there is a boot loader patching performed (MBR is modified to match the new volumes structure) to finalize the recovery.
Above information is describing the case when the file system and volume structure can be recognized at the moment of backup which is not true when we deal with LVM volumes (which I assume is true in your case) - such VMs are backed up in sector-by-sector mode, except for volumes which are not LVMs (if they are located on separate disks). The VM is recovered "as is" in this case thus placing all sectors to their original location without possibility to resize volumes during recovery (which is usually possible for recognized file systems). At the last stage of the recovery the bootloader (GRUB/LILO) is patched as well.
Note that Run VM from Backup functionality is different - it does not perform operations over volumes or patches the bootloaders, but instead mounts the VM completely "as is" (it's not the same as during the recovery). I'd definitely recommended to try mounting the VM from backup via Run VM from backup option in Acronis Backup for VMware web interface and see whether there is the same problem in the live mounted VM.
Basing on the above I think that your problem has to deal with the specifics of the CentOS VM internal partitions layout (some specifics in LVMs arrangement), so it needs to be analyzed with our support team to figure out what might be wrong. Please attach the Linux system report generated from the CentOS VM via the attached utility to your support request: see instructions in https://kb.acronis.com/content/1508 - the last section.
Thank you.
--
Best regards,
Vasily
Acronis Virtualization Program Manager
Attachment | Size |
---|---|
263111-119275.zip | 3.27 KB |
- Log in to post comments

Thanks for the reply. Yes we were using the standard LVM format with Centos 6.6/64bit. Unfortunately this happened a while ago and our guy already reconstructed the one server from file level backups on a new VM install. I'll check to see if he kept the VM and attach the files.
Before that I had to do the same but it was over a couple months ago and the corrupt system is long gone. This is why I don't rely specifically on full VM backups and do file level backups as well ! Just in case!
We did indeed already try the Run VM from Backup and we had the same corruption - that is why I was concerned and thought maybe something was wrong.
If this happens again I'll definitely retain the information. I was just going over it in my head to be sure we were not doing something that would impact the integrity of the backups with our current methods of snapshotting and reconsolidation in between acronis backups. I was trying to understand a little better how your system works as well so I could make better decisions when I need to recover in the future. You answered those questions. Thanks!
- Log in to post comments


Yes, and I forgot to answer your last question: already existing snapshots on VMs do not affect Acronis backup processes. The exception is VM is running on snapshot which was captured with _memory_ included - this is a known issue (backup will fail in this case).
- Log in to post comments