Skip to main content

Deduplication across VMs

Thread needs solution

I am wondering about the scope of deduplication in VMP. I am currently running a separate backup job for each VM on my ESXi host. So clearly there is no deduplication across the VMs. If backed up all windows server VMs in the same job, would the resulting backup be smaller than the sum of the separate backups? Is the scope of deduplication at the job level or the VM level? It there a downside to putting multiple VMs in the same backup job?
Thx

Jim

0 Users found this helpful

I am backing up two Windows Server 2008 R2 and one Windows 7 VM into a single archive. I have about 600GB of data in total, and it consumes about 250GB backed up. If I check the recovery points, one of the points is about 200GB and then all other points are a few GB. So essentially after the first VM is backed up, everything else is just incremental from that point on. Since Windows 2008 R2 and Windows 7 are essentially they same, the majority of the system files can be deduplicated, which is probably a 20-30GB saving per VM.

The main risk I see is that there is a greater chance of losing more data if the archive somehow becomes damaged or corrupt (or even accidentally deleted etc). However the deduplication and simplicity of less backup tasks is a valuable tradeoff. Occassionally I have experienced corrupted archive files (I am backing up to 5 separate USB drives) so I just format the troublesome disk and life goes on.

I would strongly advise against it. With the level at which archive files get currupted in vmprotect, you will be asking for trouble.

frestogaslorastaswastavewroviwroclolacorashibushurutraciwrubrishabenichikucrijorejenufrilomuwrigaslowrikejawrachosleratiswurelaseriprouobrunoviswosuthitribrepakotritopislivadrauibretisetewrapenuwrapi
Posts: 22
Comments: 3800

Hi!

There main benefits of saving backups from multiple machines into one archive is as correctly mentioned the deduplication functionality which is applied on archive level and therefore saves quite a bunch of space. The archive corruption concern which is raised here in fact is not that bad as it may seem. Let me give you some more explanation on this:

The probability of corruption of 2 separate archives (1 VM in each) which results in inability to recover from one of these archive is the same as when you save backups from these 2 VMs into a single archive. The archive corruption does _not_ necessarily mean that you cannot recover from _any_ recovery point. It means that there is _some_ recovery point which cannot be used for recovery. This means that in most cases when there is a corruption - it is present only for particular recovery points rather than for the entire archive.

Thank you.
--
Best regards,
Vasily
Acronis vmProtect Program Manager

So, would the recovery point be corrupt for both VMs or possibly only one. Since I am blissfully unaware of corrupted archives, I am unsure whether combining multiple VM into one archive increases the risk of not being able to recover any particular file or VM. Is it possible to detect this corruption before you need to recover a file or VM?

Jim

One other question..... If I chose to keep each VM in a separate archive, does it make sense to turn off deduplication. I'm guessing there is not that much duplication within a single VM and that the backup would run faster if deduplication were turned off.

Thanks. Jim

frestogaslorastaswastavewroviwroclolacorashibushurutraciwrubrishabenichikucrijorejenufrilomuwrigaslowrikejawrachosleratiswurelaseriprouobrunoviswosuthitribrepakotritopislivadrauibretisetewrapenuwrapi
Posts: 22
Comments: 3800

Hi Jim,

Concerning your first question: yes it is possible to detect the corruption by for example enabling automatic validation of the archive after creation (the option on the 4th step of the Backup wizard). This validation will verify the integrity of the archive and successful result means that you can recover from this particular recovery point which was/were created by particular backup task run. In other words it not only checks the data which was transferred last time (it could be small if it was an incremental backup), but also goes through the dependancy chain and verifies the actual ability to recover. You can also run validation manually from the respective wizard to check the entire archive or particular recovery points. As mentioned before the risks of getting archive corruption in case of single archive file for all backups and multiple archives is practically the same.

Concerning the 2nd question: it doesn't really matter whether deduplication is turned on or off in your case, since you are backing up all machines into dedicated archives. If you disable it now you won't see neither performance boost (since the deduplication itself is not really resources consuming - it's relatively lightweight), nor noticable additional space consumption.

Thank you.
--
Best regards,
Vasily
Acronis vmProtect Program Manager