What is VSS timeout and can it be tweaked? Does it prevent backups from success?
I'm having an issue 2 nights in a row now that our SQL 2008 R2 server did not backup via vmProtect 9. Usually this resolves itself on Server 2008 R2 machines, but now twice in a row tells me it's not going to resolve by itself.
I have a different experience with Server 2003 and Exchange 2007, where the VSS service just will be frozen in a waiting state and backups will 100% fail everytime until the VM is rebooted. However this seems different.
We have e-mail logs for each job and this job did e-mail me at 4:01 AM with this log:
Backup has failed. Please check the log for additional information.
1 Information 10/15/13 4:00:02 AM Task 'dcuSQL1' changed its state from 'idle' to 'running'.
2 Information 10/15/13 4:00:02 AM Task 'dcuSQL1' was started.
3 Information 10/15/13 4:00:05 AM Backup deletion
4 Information 10/15/13 4:00:06 AM 1 backup(s) have been deleted from archive DCUSQL-ONE.TIB.
5 Information 10/15/13 4:00:06 AM Backup deletion has successfully completed.
6 Information 10/15/13 4:00:06 AM Backing up VM(s).
7 Information 10/15/13 4:00:06 AM Successfully connected to 'VMware vCenter Server 4.1.0 build-799345'.
8 Information 10/15/13 4:00:06 AM Successfully connected to 'VMware ESX 4.1.0 build-988178' through VMware vCenter Server.
9 Information 10/15/13 4:00:06 AM Successfully connected to 'VMware ESX 4.1.0 build-988178' through VMware vCenter Server.
10 Information 10/15/13 4:00:07 AM Successfully connected to 'VMware ESX 4.1.0 build-988178' through VMware vCenter Server.
11 Information 10/15/13 4:00:07 AM Successfully connected to 'VMware vCenter Server 4.1.0 build-799345'.
12 Information 10/15/13 4:00:09 AM Successfully connected to 'VMware vCenter Server 4.1.0 build-799345'.
13 Information 10/15/13 4:00:09 AM Successfully connected to 'VMware ESX 4.1.0 build-988178' through VMware vCenter Server.
14 Error 10/15/13 4:00:45 AM The operation has failed.
Additional info:
--------------------
Error code: 26
Module: 100
LineInfo: 48afbd3608a410ca
Fields:
Message: The operation has failed.
--------------------
Error code: 3003
Module: 538
LineInfo: 38b2393b56c5aa77
Fields: StringAlertId : sql vss writer disabled
Message: Converted agent alert
--------------------
15 Information 10/15/13 4:00:46 AM Task 'dcuSQL1' changed its state from 'running' to 'idle'.
16 Error 10/15/13 4:00:46 AM Task 'dcuSQL1' failed: 'Failed to create a backup.
Additional info:
--------------------
Error code: 3
Module: 435
LineInfo: 555b5abba0950337
Fields:
Message: Failed to create a backup.
--------------------
Error code: 32786
Module: 114
LineInfo: 28314c961de7d337
Fields:
Message: Failed to prepare for backing up.
--------------------
Error code: 353
Module: 149
LineInfo: a71592046cb2c5f6
Fields:
Message: Failed to back up the group.
--------------------
Error code: 2
Module: 218
LineInfo: 338a407ad20e0987
Fields:
Message: Error occurred while running the backup and recovery engine.
--------------------
Error code: 3003
Module: 538
LineInfo: 38b2393b56c5aa77
Fields: StringAlertId : sql vss writer disabled
Message: Converted agent alert
--------------------'.
So I went into the Windows Event logs for the particular virtual machine around the 4 AM time frame. The only thing I can see is at 4:06:13 AM is :
Source: VSS
Event ID: 8224
General: The VSS service is shutting down due to idle timeout.
Prior to that or after that is just a bunch of SQLAgent Job Engines that dont appear to have anything related to backup in them.
My question to the folks out there is that did this VSS timeout cause the backup to fail? Or did Acronis attempt to initiate VSS and then just for no reason stop / crash / segfault / etc... and then Windows stop the service due to a time out of no activity?

- Anmelden, um Kommentare verfassen zu können

Ok thanks for the information. It is a Server 2008 R2 virtual machine running SQL Server 2008 R2. I got the same error log at 4:01 AM today, so that's 3 days in a row of the same exact errors. Also at 4:06 AM in the Application event log I got "The VSS service is shutting down due to idle timeout." again.
Keep in mind prior to 3 days ago, Acronis VM Protect backed this virtual machine up successfully. I didn't change anything or roll out any patches through WSUS so I don't think any operating system parameters changed.
When I run a net statistics server on this VM it appears that it's been running since 7/27/2013 11:01 PM. I can't reboot it during the day as it's a SQL server that would take down at least 3 applications and a website. So hopefully I can try to find a fix without rebooting... but being that it's a Windows machine and not Linux, avoiding reboots is sometimes hard to do :-/.
I will take a look at your link and also look into more VSS related topics on the Internet.
EDIT: I'm attaching an excerpt from the vmware.log for this virtual machine. I've included events written to the log from 10/13 through 10/16. You'll see the last successful backup on 10/13 produces a lot of events in the logs. But for 10/14, 10/15 and 10/16 it doesn't seem like the VM was told to even snapshot. This makes me think its more VMWare or Acronis instead of the Windows 2008 R2 OS inside this guest. What do you make of these logs? Also I've checked and disk.enableUUID = "true" in the VMX file and this machine was not storage vmotioned any time recently.
- Anmelden, um Kommentare verfassen zu können

This 2008 R2 Windows / SQL server needed to be rebooted. Did this Friday night after hours and then backups from vmProtect have been successful since.
- Anmelden, um Kommentare verfassen zu können