Suggestions for backing up Hyper-V cluster with CSV
TL;DR: I'm hoping some community members have a working Windows Hyper-V CSV backup using Cyber 12.5 and would be willing to share how they have things configured
I'm looking for recommendations on the best way to backup VMs on a Hyper-V cluster. I have about 160 VMs (mostly Windows but some Linux) spread out across 5 Hyper-V VMs. There are two CSV volumes on a Dell ME4048 iSCSI SAN connected by 10 Gb fiber (Dedicated network for iSCSI)
The major problem is random errors creating snapshots for VMs. On the surface, it appears to be performance issues on the SAN, but I find that difficult to believe when its running all enterprise SAS based SSD. I have been working to balance VMs to a 2nd CSV volume and I'm cautiously hopeful this may help, however, I think this may be part of a larger problem. The snapshot errors are random. The VMs that have issues are not the same backup to backup.
The other major issue is that when backups are running, the entire VM infrastructure suffers terrible performance. User's can't get logged into VMs, VPN Authentication times out and more. This would happen if we created a Test Plan with just a single VM. And it affected All Nodes of the Cluster, even when backing up the 1 node via Test Plan.
Note: No performance issues are present during the day when backups aren't running and when all the staff is actually working. Just during the backup window. Kill the backups, performance goes back to optimal.
Another problem is that the Cluster Nodes in Acronis and the Dynamic Groups I use to schedule backups are not updating as VMs move around the cluster. So if VM1 was on Host1 last night, but we move it to Host2 today, Acronis won't try to back it up when Host2's plan clicks off later today because it still sees the VM as belonging to Host1
This also occurred when I recently Live Migrated a VM from HQ to a remote site. In Acronis, the new Host sees the VM, but the dynamic group does not. So I told Acronis to backup the VM Host which does see the VM. When the plan ran, it 'skipped' the VM. It said it backed it up in 5 seconds. No backup file in the Backup Location. I even renamed a VM as its name changed. Acronis is still trying to backup the old VM name, and failed (Error is: Can't find it, go figure)
The reason I'm using Dynamic Groups for VMs (HostID="<host GUID>") is that we have 1 big server that collects event logs. I don't really need to be backing up it up as it is a large VM and not critical data. So I use the dynamic group to exclude this VM regardless of what Host it is currently running.
Because of the Snapshot errors, I have to run a backup against a single Host node each night. The backups don't complete (or have many errors) if I try to just backup the whole cluster in one Plan. I back up Host1 on Monday, Host2 on Tuesday, etc. The Snapshot timeout is 30 minutes, so 20 machines with errors adds 10 Hours to the backup duration.
What is strange is that when a Plan starts against a group of VMs on Host1, it actually kicks off against all 5 Hosts, but quickly 'completes' on the other 4 Hosts.
What I'm looking for is any recommendations on the Best Practices to make things work smoother and easier, and preferably with no errors. Hopefully some community members have a setup that is working for them and would be willing to share.
Yes, I have a ticket open, but after three months, no progress or improvements. Acronis says it's the SAN. The SAN vendor says its Acronis. And I'm stick in the middle.
My next test is to bring in a test iSCSI NAS to see if anything changes. I may even try SMBv3 clustering

- Accedi per poter commentare