Category Archives: SAN

I upgraded from ESX 3.0.2 to ESX 3.5 and it was a pain.

I upgraded our ESX servers over the Christmas break.  I had to install a new ESX server, so I took the opportunity to upgrade the rest of our environment.  It was a pain in the ass.  There were a few bugs that caused me problems.  Details below:

I decided to wipe the ESX servers and install 3.5 fresh from the CD.  I did the upgrade from 2.5.2 to 3.0.1 this way and it worked well.  I upgraded the Virtual Center server from 2.0 to 2.5.

VMotion caused me a lot of problems.  I was not able to ping the VMotion port after the upgrade.  This happened to varying degrees on all of the servers.  The last server was the worst.  It was driving me crazy.  I had enabled VMotion and named it properly.  It just would not work.  Eventuall I called support.  They ran vmkping to the IP address of the VMotion port on the server while I pinged the IP address from my workstation.  This seemed to magically enable the VMotion port.  Running just vmkping or just ping didn’t work.  The combination of the two worked for some bizarre reason.

“No Active Primaries” message when I tried to add a server to the Cluster.  This one perplexed me for a while.  It comes from the way clustering works.  Clustering doesn’t work perfectly in mixed 3.0/3.5 environments.  The first server added to a cluster is considered the “primary.”  When I initially created the cluster, ESX1 (server name) was the first server in the cluster.  When I did the upgrade, I took ESX 1 out of the cluster.  It didn’t pass the role of “primary” onto one of the other servers.  So when I tried to add ESX1 back into the cluster, it gave me the “No Active Primaries” error.  I fixed this by removing all of the servers from the cluster and adding them back in.  This thread pointed me towards a solution:  http://communities.vmware.com/message/701671;jsessionid=AA7526EEA3E0EE5EAFAFDB7A761815ED

“Unable to read partition information from this disk”: I got an error like this when I was installing ESX on a machine attached to a SAN with raw drive mappings.  I disconnected the server from the SAN and started the installation over just to be safe.  A good piece of advice… Always disconnect the server from the SAN when you are reinstalling ESX.  There is a decent possibility that you’ll accidentally overright your LUN’s.

 I had some other general problems, but nothing too serious.  Let me know if you have any questions or issues that I can help with.

Virtual Desktop Infrastructure, Client Consolidation, and Blade PC’s… Oh My!

I’ve begun researching VDI because I believe that the PC is no longer necessary in medium to large environments that can operate with less than workstation class performance.  The potential advantages of replacing PC’s with Thin Clients that connect to full fledged XP installations are compelling.  I’ve been researching all of this for a couple weeks now, and I have to say that VDI, CCON, CCI, is in a pre-1.0 state.  I’ll explain it all below.

There are three terms going around to describe Client Consolidation technology.  They are:

  • VDI: Virtual Desktop Infrastructure
  • CCON: Client Consolidation
  • CCI: Consolidated Client Infrastructure

They all essentially mean the same thing.  My definition of CCON is centralizing desktop/PC systems by hosting them in the data center.  All computing functions other than KVM are hosted and managed in a computer room away from the user.  The user uses a client device or application to access the centralized computer.  There are multiple terms battling to be the methodological name for this technology.  VDI was the first term that I saw used.  VDI is the trendy name in my view, and has been co-opted by VMware and turned into a product.  CCON is the name used by an IBM employee named Massimo Re Ferre’ who is a heavy contributor to VDI technology research.  Client Consolidation happens to be the name of IBM’s implementation of VDI (what a coincidence).  CCI is a product name used by HP after they abandoned the use of VDI.  Another name that’s out there is “Centralized Computing.”  Centralized Computing is the term used to define the days of mainframes and dumb terminals. 

My preference for the academic name of this technology is Client Consolidation (CCON).  I believe that CCON is the most descriptive, most open name of all.  CCON is general enough to encompass all of diverse technologies in this area.

There’s a lot of overlapping information and noise out there.  I want to explain the bottom line as I see it.

The technology “models” (Re Ferre’, 2007) for CCON are:

  • Shared Services (Citrix)
  • Virtual Machines (VMware, Xen, others)
  • Blade PC’s/Blade Workstations (HP, ClearCube)

You will ultimately have to select one (or more) of those methedologies for a production rollout.

Client consolidation is all about the use of RDP to connect to Windows systems.  RDP is what it’s all about (some solutions prefer/support ICA).   If you know how to use Remote Desktop, you’re most of the way to understanding what CCON is all about.   Everything after this is about services and features built around the use of RDP accessed Windows systems (VM’s, Blade PC’s).

The components of CCON are:

  • Client Access Devices (thin clients, repurpossed PC’s)
  • Connection Broker (software)
  • Host Systems (VM’s, Blade PC’s)

 VDI-CCON

Client Access Devices are straight forward.  You need a device that can understand how to connect to remote systems using RDP.  The client device can be a full blown XP/Vista PC, or a thin client running the proper client software.  You’re going to hear a lot about Windows XPe in this space.  XPe is a stripped down version of Windows XP often used for development and loaded onto many thin clients. 

Host systems are also straight forward.  You can run your XP/Vista/Other hosts as VM’s or on Blade PC’s.

Connection Brokers is where all the fun is.  Connection Brokers handle the setup, and advanced features of CCON.  Brokers decide (based on policy) which VM/Blade should be assigned, the features that are available to the user, and in some cases the robustness of the service.  I think of Brokers as travel agents.  A client shows up to the broker with a request.  The Broker knows how to handle the request based on requirements and makes all of the arrangements including the connection.  The broker is usually finished at that point, though the broker is an intermediary in some solutions.

That’s basically what CCON is all about.

CCON is barely at a 1.0 level.  There’s very little information out there (other than Citrix) and all of the solutions are patch up jobs.  There’s no long standing, widely accepted solution.  Most of the solutions that I have found have been assembled piecemeal.  The absolute best information that I have found comes from Massimo at http://it20.info/misc/brokers.htm.  He’s created a table with extensive descriptions of all the features he’s been able to confirm for brokers and clients.  It’s not a complete list of brokers and features, so do your own research and testing (HP SAM, IBM TMP missing).  Regardless, it is a must read if you are going down the CCON road.

Two other items of interest are VMware’s VDI forum and HP’s CCI forum.  Notice that there is very little activity at those forums.  That’s because most people still aren’t working on this.  Also, VMware’s product is in Beta.  That’s right…VMware’s broker is vaporware, yet they’re calling it VDM 2.0.  Now that’s good marketing.

That’s it for now.  Please let me know if you have any questions or if you have something to add.  There is so much information out there that I’m positive there is more to come.

MTI Technology files for bankruptcy, going out of business.

***UPDATE 3*** A press release just popped up stating: “MTI also announced today that, due primarily to continued operational and financial difficulties experienced by its U.S. operations, it has filed for bankruptcy protection pursuant to Chapter 11 of the U.S. Bankruptcy Code.”  Bye bye MTI. 😦

My company has used MTI Technology OTC:MTIC.pk for some EMC storage and VMware projects.  There have been signs that they were in trouble since May.  I got a letter from their CEO in May stating that some sales people had left, but everything was fine and they were still the #2 EMC reseller.  That was weird, so I started looking around and found out that MTI was being de-listed from NASDAQ.  I was then contacted by some of the departed sales people who wanted me to do business with them at their new company.  Then more and more people that I dealt with from MTI left.  Finally, I wasn’t getting calls returned from MTI employees I know.  I eventually got in touch with someone I know that used to work at MTI.  That person told me that “no one really works for MTI anymore.”

My understanding is that MTI has let go of almost everyone and they are on their way out.  I’m not totally surprised.  MTI was an IHV that attempted to transform itself into a VAR.  They sold only EMC Storage.  They also sold EMC and VMware software.  They did a decent job with the EMC storage, eventually becoming the #2 EMC reseller behind only Dell.  They always behaved as if Sales and Project Services were different organizations.  They never behaved as if they were on the same team, which caused them unnecessary problems.

It’s unfortunate to see MTI go. 

*UPDATE* It appears that my info and instincts are correct.  MTI defaulted on a loan from Comerica and a prommisory note from Pencom Systems Inc..

*UPDATE 2* One of my sources confirmed what I learned last week.  They said that all sales and project people were let go.  The also said some support staff remains to deal with commitments.  They said that MTI will try to do the right thing for their customers.  That remains to be seen.

This post is based on information I have received from people associated with MTI that are in the know and my opinion of their services.  If you have information that proves MTI is staying in business, I’ll gladly retract this post.

RAID 6 is the new RAID 5

I came across this article from Network Computing’s Howard Marks.  He writes about a two new study’s from Carnegie Mellon and Google about hard drive reliability.  The short story is that hard drives die in bunches and the massive sizes of SATA and other drives put organizations at risk of data loss from multiple disk failures.  Howard Marks recommends using RAID 6 for all drives 500GB and larger.  It’s a worthwhile read for storage managers.  Enjoy

The Truth About Storage Reliability

RAID 6 Primer

UPDATE:  I called EMC today to ask if they support RAID 6 on any of their arrays.  I couldn’t find it in documentation, and I felt it was worth asking.  They said that do no not support it at all, and there is no known plan to support it.

UPDATE 2:  Things have changed since I orignially posted this.  The Clariion CX-xxx series still doesn’t support RAID 6.  The CX3-xx series does support RAID 6.  Thanks to Dr. Product for pointing this out.

I completed our VMware ESX 3 upgrade this past weekend.

 I have been planning my company’s ESX upgrade for a while.  After many delays and other conflicts, I was able to schedule it for this past weekend.  I want to braindump everything I learned if possible. It’s a bit of a mish-mosh, but so is my brain.

  • Plan, Document, Plan, and Document: There are so many moving parts that you’re going to want to document EVERYTHING.  The upgrade is not difficult, but it is tricky.
  • Be prepared for your Virtual Center upgrade to go bad.  This is the only in place upgrade that you cannot avoid and it’s the least reliable.  Have a backup plan whether it’s restoring the database or wiping it and starting clean.  Make a decision in advance.
  • If you lose VC you lose stats, permissions, VM groups, and a few other things.  Document all of VC at minimum (if possible).
  • VMware says you need 1200 MB of free disk space on your LUN’s.  This is not enough.  I had at least 2 gigs and still ran into problems.
  • The VM Hardware upgrade moves VM configuration files from the ESX server to the SAN.  One of these files is the VM swap file.  The swap file is twice the size of the VM’s memory.  Reducing the assigned memory increases free space on the LUN.  This helps with insufficient disk errors at boot up.
  • You can’t suspend a VM if you don’t have enough disk space.
  • Rebooting the ESX servers seems to clear up “Object” errors.
  • VMotion: You have to license it, set up the virtual switch as vmkernel, AND enable VMotion on the port.
  • WinSCP is a great program.
  • You MUST upgrade Hardware on all VM’s before putting them in a cluster.  This makes sense, but isn’t obvious.
  • Test as much of your upgrade as possible in advance.  This helped me tremendously.
  • Make sure that your VMFS2 LUN’s are formatted at 8MB block size or less.  ESX cannot upgrade LUN’s that are formatted with anything larger than 8MB block size.  The two LUN’s I used as backup were both formatted with 16 MB block sizes.  I knew the limitation, but I didn’t think it affected me because I always used the default block size.  The only thing that’s strange about them is that they are both 1.7TB.
  • “unable to upgrade filesystem” + “function not implemented” errors come from the wrong block size on the VMFS2 partition.
  • Renaming datastores is not destructive in ESX 3, but I wouldn’t recommend doing this until all VM’s are functional.
  • The upgrade is a good chance to upgrade server firmware.
  • Make sure all VMDK files are connected before upgrading Virtual Hardware.  Otherwise you will get errors about disk version mismatches.  I used the recommended resolution.  I’m not confident that I did the right thing.
  • Invalid affinity entry errors will happen if you assign a processor or memory affinity to a server and then move it to a server that cannot fulfill the entry.  This could happen if you move a VM from a quad proc. server to a dual and set processor 4 as the affinity.  Best way to fix this is remove the affinity.  Second best way is to recreate the VM using the same disk files. (Remove from inventory, recreate.)
  • Network Copy Failed for File. [] /root/vmware/<servername>/nvram error is most likely a DNS problem.  Make sure to register all possible DNS names in the hosts file of each server involved.  In my case, the registered name and FQDN was different.  More info can be found here.
  • If there are yellow or red alarms on most VM’s after Virtual Center 2 upgrade:  The upgrade sometimes truncates records including the alarm thresholds.  It will truncate 70% and 90% to 7% and 9%.  VC looks like a Christmas tree the first time you log in.  Your options are bad and worse in this case.  I chose to wipe the old DB and create a new one.  The stats were not critical to us.  Doing this also affects rights, groups, and other things.
  • “The virtual machine is not supported on the target datastore.”  Rebooting solves lots of problems during the install.  Especially this one.
  • VMware Tools for Netware.  I need to address this in a seperate post, but the answer is that the only instructions for this are old GSX 3.2 instructions.  They work.

Sorry about the disorganized info, but this is just a braindump.  Please let me know if you have any questions and I will get you more detailed info.

The next thing I’m going to play with (unless I get to something else first)

The next thing I plan to play around with is the FreeNAS application:

FreeNAS is a free NAS (Network-Attached Storage) server, supporting: CIFS (samba), FTP, NFS, AFP, RSYNC, iSCSI protocols, S.M.A.R.T., local user authentication, Software RAID (0,1,5) with a Full WEB configuration interface. FreeNAS takes less than 32MB once installed on Compact Flash, hard drive or USB key.
The minimal FreeBSD distribution, Web interface, PHP scripts and documentation are based on M0n0wall.

Our new ESX 3 environment can connect to NAS and I want to see how useful the feature is. This might be a chance to free up some expensive SAN disks.

FreeNAS is based on FreeBSD 6.2. It’s a tiny app that appears to be very powerful. I’ve been watching it grow over the last few months. The developers are doing a good job of improving the application. I’ll let you know how it goes.