I upgraded our ESX servers over the Christmas break. I had to install a new ESX server, so I took the opportunity to upgrade the rest of our environment. It was a pain in the ass. There were a few bugs that caused me problems. Details below:
I decided to wipe the ESX servers and install 3.5 fresh from the CD. I did the upgrade from 2.5.2 to 3.0.1 this way and it worked well. I upgraded the Virtual Center server from 2.0 to 2.5.
VMotion caused me a lot of problems. I was not able to ping the VMotion port after the upgrade. This happened to varying degrees on all of the servers. The last server was the worst. It was driving me crazy. I had enabled VMotion and named it properly. It just would not work. Eventuall I called support. They ran vmkping to the IP address of the VMotion port on the server while I pinged the IP address from my workstation. This seemed to magically enable the VMotion port. Running just vmkping or just ping didn’t work. The combination of the two worked for some bizarre reason.
“No Active Primaries” message when I tried to add a server to the Cluster. This one perplexed me for a while. It comes from the way clustering works. Clustering doesn’t work perfectly in mixed 3.0/3.5 environments. The first server added to a cluster is considered the “primary.” When I initially created the cluster, ESX1 (server name) was the first server in the cluster. When I did the upgrade, I took ESX 1 out of the cluster. It didn’t pass the role of “primary” onto one of the other servers. So when I tried to add ESX1 back into the cluster, it gave me the “No Active Primaries” error. I fixed this by removing all of the servers from the cluster and adding them back in. This thread pointed me towards a solution: http://communities.vmware.com/message/701671;jsessionid=AA7526EEA3E0EE5EAFAFDB7A761815ED
“Unable to read partition information from this disk”: I got an error like this when I was installing ESX on a machine attached to a SAN with raw drive mappings. I disconnected the server from the SAN and started the installation over just to be safe. A good piece of advice… Always disconnect the server from the SAN when you are reinstalling ESX. There is a decent possibility that you’ll accidentally overright your LUN’s.
I had some other general problems, but nothing too serious. Let me know if you have any questions or issues that I can help with.
Pingback: I upgraded from ESX 3.0.2 to ESX 3.5 and it was a pain.
I am getting the “Unable to read partition information from this disk” error on my 3.5 ESX server when I try and attach it to an iSCSI SAN running DSS lite (www.open-e.com). I can mount the iSCSI drive on any windows machine but the Vmware server keeps giving me this error.
How did you resolve it?
I always backup my ESX servers (only the ext3 partitions) and the VC server by CloneZilla Live CD before patches and upgrades are installed. I’ve also verified that a restore is working as it should 🙂
If someone likes to try, don’t forget to …
ALWAYS DISSCONNECT THE SAN FROM THE SERVER YOU ARE GOING TO BACKUP BEFORE BOOTING ON THE CLONEZILLA LIVE CD!!!
The clonezilla version who was verified working on my servers is 1.0.7-18
thanks for the info. I had a new cluster that the the HA agent issue would fail on one or the the depending on which node came up first.
I checked and each node had both the SAN storage and also local storage datastores configured. I deleted the unused local datastores and disabled/re-enabled HA. everythng is working as expected.