I have been planning my company’s ESX upgrade for a while. After many delays and other conflicts, I was able to schedule it for this past weekend. I want to braindump everything I learned if possible. It’s a bit of a mish-mosh, but so is my brain.
- Plan, Document, Plan, and Document: There are so many moving parts that you’re going to want to document EVERYTHING. The upgrade is not difficult, but it is tricky.
- Be prepared for your Virtual Center upgrade to go bad. This is the only in place upgrade that you cannot avoid and it’s the least reliable. Have a backup plan whether it’s restoring the database or wiping it and starting clean. Make a decision in advance.
- If you lose VC you lose stats, permissions, VM groups, and a few other things. Document all of VC at minimum (if possible).
- VMware says you need 1200 MB of free disk space on your LUN’s. This is not enough. I had at least 2 gigs and still ran into problems.
- The VM Hardware upgrade moves VM configuration files from the ESX server to the SAN. One of these files is the VM swap file. The swap file is twice the size of the VM’s memory. Reducing the assigned memory increases free space on the LUN. This helps with insufficient disk errors at boot up.
- You can’t suspend a VM if you don’t have enough disk space.
- Rebooting the ESX servers seems to clear up “Object” errors.
- VMotion: You have to license it, set up the virtual switch as vmkernel, AND enable VMotion on the port.
- WinSCP is a great program.
- You MUST upgrade Hardware on all VM’s before putting them in a cluster. This makes sense, but isn’t obvious.
- Test as much of your upgrade as possible in advance. This helped me tremendously.
- Make sure that your VMFS2 LUN’s are formatted at 8MB block size or less. ESX cannot upgrade LUN’s that are formatted with anything larger than 8MB block size. The two LUN’s I used as backup were both formatted with 16 MB block sizes. I knew the limitation, but I didn’t think it affected me because I always used the default block size. The only thing that’s strange about them is that they are both 1.7TB.
- “unable to upgrade filesystem” + “function not implemented” errors come from the wrong block size on the VMFS2 partition.
- Renaming datastores is not destructive in ESX 3, but I wouldn’t recommend doing this until all VM’s are functional.
- The upgrade is a good chance to upgrade server firmware.
- Make sure all VMDK files are connected before upgrading Virtual Hardware. Otherwise you will get errors about disk version mismatches. I used the recommended resolution. I’m not confident that I did the right thing.
- Invalid affinity entry errors will happen if you assign a processor or memory affinity to a server and then move it to a server that cannot fulfill the entry. This could happen if you move a VM from a quad proc. server to a dual and set processor 4 as the affinity. Best way to fix this is remove the affinity. Second best way is to recreate the VM using the same disk files. (Remove from inventory, recreate.)
- Network Copy Failed for File. [] /root/vmware/<servername>/nvram error is most likely a DNS problem. Make sure to register all possible DNS names in the hosts file of each server involved. In my case, the registered name and FQDN was different. More info can be found here.
- If there are yellow or red alarms on most VM’s after Virtual Center 2 upgrade: The upgrade sometimes truncates records including the alarm thresholds. It will truncate 70% and 90% to 7% and 9%. VC looks like a Christmas tree the first time you log in. Your options are bad and worse in this case. I chose to wipe the old DB and create a new one. The stats were not critical to us. Doing this also affects rights, groups, and other things.
- “The virtual machine is not supported on the target datastore.” Rebooting solves lots of problems during the install. Especially this one.
- VMware Tools for Netware. I need to address this in a seperate post, but the answer is that the only instructions for this are old GSX 3.2 instructions. They work.
Sorry about the disorganized info, but this is just a braindump. Please let me know if you have any questions and I will get you more detailed info.
Actually, the upgrade is a very simple task. I think the problems are more related how you cope with potential downtime due to vmdk backups, or if you want to do it with almost no downtime (vmotion upgrades).
Other than that, after a couple of upgrade projects the rest you do with your brains pretty much switched off. Very boring 🙂
HP, I agree. The greatest downtime and stress was related to backups. Nothing else cause so much stress, and needed so much planning as backups.