Pegasus memory leak causes PSOD

Posted by Frank van Egmond
Oct 27 2009

While following this knowledge base article on increasing the timeout value of VMware Update manager during scan and/or remediation tasks, after restarting the management agents as stated in the article, the ESX server decided to turn purple. Since purple screens are so rare that they are even collector’s items to some people, I decided to investigate the problem.

After recovering from the PSOD by rebooting the system, iIt seemed that the hostd.log  log files were cluttered with messages like ‘Memory checker’ XXX warning] Current value 164768 exceeds soft limit 122880′. Luckily a screenshot was taken as well from the console and an interesting part stating that the Heartbeat was lost was filling the screen.  While I was reading that message I was starting to wonder. Could I have been that wrong? Was it not possible to restart the management agents while ESX was running AND hosting VM’s? It never gave me problems before!?

Let’s try the knowledge base. An invaluable resource of information. As I typed in Heartbeat lost and PSOD the following article popped up. It seems there is a memory leak in the Pegasus service.

Unhappy with the answer and workaround provided I decided to open a support call with VMware. After sending the logs and the vm-support files the same answer AND workaround was provided.

Curious for a more lasting fix I heard that this issue is scheduled to be resolved in VMware ESX 3.5 Update 5. Let’s hope it comes soon.

PSOD

Trackback URL for this entry

Network monitor