Browsing the archives for the VMWare tag.

Upgrading to ESXi 5.5 Update Manager vs ISO

Technology

I recently upgraded a cluster of ESXi 5.1 hosts to 5.5 and learned a lesson about using a NIC that requires a third party driver.  All of my hosts are using Qlogic 10Gb CNA cards which requires a driver to be loaded (a custom vib).  Unfortunately if you try to upgrade to 5.5 using Update manager it disables the third party drivers before actually doing the update.  Since Update manager does the update over the network interface as soon as that driver is disabled the host loses contact with vCenter and Update manager and the upgrade hangs.  Luckily nothing has actually been done yet and the host can simply be rebooted and it will come back up as though nothing has been changed.  The solution is to always do the upgrade using the ISO (either as a physical DVD or USB flash drive or network mounted via a management card like an iLO on HP servers) so that there is no networking required during the upgrade.  Once the upgrade is completed I was able to install the newer driver for ESXi 5.5 and a reboot later and my host was back in business.

So the lesson learned is be very careful if using third party drivers not part of the base ESXi image.  It always seems to be safer to interact with the host directly rather than using vCenter for upgrades (never had an issue yet with patches though) if third party drivers are involved at all.

The next time I have to upgrade I think I may try to create a custom ISO with my third party drivers already integrated just to avoid the extra steps.  If I do I’ll try to post about the process and how it went.

No Comments

When power saving is not your friend

Technology

I’ve been investigating a performance problem in a VM on one of our ESXi 5 clusters that led to an interesting discovery about power savings settings on the ESXi host.  Basically under certain scenarios (and perhaps specific CPUs) they physical CPUs will be down clocked even though a VM is trying to use 100% of its CPU.

The physical host servers are HP DL385 G7 with 2 AMD Opteron 6174 12 core processors @ 2.2GHz and 128 GB of RAM.  They boot from an integrated SD Flash card and all other storage is provided by our Compellent SAN.

In the bios there are 3 key settings under the Power Management Options:

HP Power Profile – This defaults to “Balanced Power and Performance” but I’ve changed it to “Maximum Performance”

HP Power Regulator – This defaults to “HP Dynamic Power Savings Mode” but changes automatically to “HP Static High Performance Mode” after changing the power profile setting

Advanced Power Management -> Minimum Processor Idle Power State – This defaults to “No C-states” and that is what we want it set to

The VM I’m testing with has 4 vCPU and 8GB RAM assigned to it.  This VM is the host for a Lotus Domino server with some custom applications.  When the application is used it can cause the CPU to go to 100% utilization within the VM.

From testing the same processes over and over we observed that each process would take 50-150% longer to run with the bios set to Balanced vs having it set to Max.

What I believe is happening is that while the VM is running at 100% cpu it only using 4 of the 12 cores of a single physical socket (and 4 of 24 total in the host) and the other VMs on this host are all light CPU load so the physical host perceives itself to be lightly loaded and so is down clocking the CPU.  So our VM running at 100% CPU is not getting 2.2GHz of clock speed but some lesser amount depending on how much down clocking the host has done.  Since that down clocking is dynamic that would also account for the performance variance we are seeing.

In googling around I’ve found other people using the AMD Opteron 61xx series processors with VMWare having a similar issue.  It’s possible this is just an issue with that line as I don’t believe a CPU should slow the clock speed dynamically if a single core is being used completely (rather than relying on an average load accross all cores to determine if it should save power by down clocking).

We have another cluster that uses AMD Opteron 6282 SE processors I plan to do some additional testing on to see if the problem exists there as well.  I’ll update this post once I’ve had a chance to do that.

For now all of our hosts using the 6174 processors have been set to force max performance (more power and heat unfortunately).

No Comments