Skip to main content

Have you used Deduplication on Acronis B & R Adanced Server 11?

Thread needs solution

Hi Folks:

We are looking at several software backup packages.

Have any of you used the deduplication component with Acronis B & R 11? My understanding is that the deduplication is 'block' level and not file level. My supervisor is afraid that there is a possibility that two blocks might have the same hash, but actually be different(resulting in corruption).

Thank you for any "Real" world experience...

Matthew

0 Users found this helpful

I finally got this product going yesterday. We use deduplication and I was almost shocked what i got. See attached file.

I am limited as to what I have available for servers in this particular location, so the dedup database AND vault is on the same disk. The disk is a RAID-6 (4+2) using 2TB 7200rpm SATA disks.

I installed the dedup option on the clients as well. I can see that the client is sending very little data across the wire back to the Acronis server/Vault.

I cannot speak to restoring from our deduped backup as of yet. We'll be trying at some point for a test.

Attachment Size
89825-99226.png 46.53 KB

Acronis Deduplication is block level (with 4k blocks AFAIK).

Deduplication component caused severe dataloss (complete depot lost) due to corruption in the block database. The case took a few months - in the meantime a second depot got corrupted - it is yet unclear if the second data loss is related to deduplication (or if it is data loss at all - at the moment the depot is just not accessible - of course that is pretty much the same as data loss.

I don't know how ABR handles hash collisions - odds are very low though - let me quote from this thread:


We are going to compare the probability of an unchecked and undetected hash collision in a dedup file system such as lessfs with the probability of an undetected CRC error associated with a disk I/O operation.

If a crc32 fails to detect a disk error, then it means that the data in the disk block that was just read is not the same as the data that was written to that block, but the CRC still checks out good. What is the probability of this happening?

If a hash collision in a dedup filesystem causes the wrong data to be stored for a block, meaning the block will be read as the data that first generated that same hash, and not the data that caused the collision, then we have a collision error. What is the probability of this happening?

We may view the CRC as a type of hash for the purposes of this discussion, as it is a many-to-one mapping, as is a hash; the difference is primarily the number of bits involved. Thus we consider the crc32 as a 32 bit hash, whereas the Tiger hash used in lessfs is a 192 bit hash.

If we consider a disk block of d bits in length, and a hash of h bits in length, then we desire to find the probability of a hash collision where two distinct data blocks generate the same hash code. We assume that each hash code is equally likely, as is each data block.

There are 2^d different data blocks and 2^h different hash blocks. This means, assuming uniform distribution, that there are 2^d/2^h = 2^(d-h) data blocks that produce the same hash code, or that all collide with each other.

Since the number of collisions with a given block is 2^(d-h), and the total number of blocks is 2^d, then the probability of a collision is:

2^(d-h)/2^d = 2^(d-h-d) = 2^-h = 1/2^h

So the probability of a collision is independent of the block size. It depends only on the number of bits in the hash.

Therefore, the chance of an undetectable read error occurring because of a crc collision is 1/2^32, or less than 4e-9, whereas the probability of a hash collision in the dedup algorithm causing an error is 1/2^192, which is astronomically smaller (by a factor of 1/2^160) than the crc allowing an error to slip through.

Conclusion: Given a suitably long enough hash code, we do not need to worry about hash collisions causing errors in our data. Therefore, lessfs (and other dedup file systems) are justified in not performing a bit-for-bit compare when a hash matches a previously stored value. That is, the chance of an undetected crc disk read error occurring is much greater than the chance of a hash collision in the dedup algorithm. The dedup algorithm is many orders of magnitude more reliable than the disk drives it is running on.