Last week we installed a new SAN and 3 new servers to run some new application. One of our guys was responsible for power. He decided to pull some plug because he thought everything was redundant anyway. BIG MISTAKE. This guy pulled the plug on something that was supporting a second power strip. Our primary ESX server and a database server shutdown. Phones went off the hook as expected. The ESX server came up OK, but the database server had issues. Some file locked up and the database wouldn’t start. This is another in a series of boneheaded manmade screw-ups.
I posted that I was insterested in a book called Visual Ops a couple months ago. I am interested in this book because I feel it is a practical guide for changing my department. I offered the book to my boss and he’s not interested. I’m probably going to offer it to another of my bosses. I hope I have more success with him. If not, I will have to lead by example. The only downside is I won’t have a mandate at first. So be it. Right now we are too lax about controls and change management. We constantly repeat dumb mistakes and there isn’t enough accountability for risk taking.
BTW, READ THS BOOK if you are in IT.
Note: I know the power situation is bad if we are using power strips. We are replacing all the racks and UPS within 5 months.
If only your story was uncommon. I had the privledge of watching an entire enterpise san go down. They had to activate their business continuity plan (cost millions). 80% of their customer facing apps took a dive. All becasue of an ” non- service effecting upgrade” was performed to the firmware of the SAN. They say that the only thing that can comprimise any redundancy scheme is the human being.
send this link to your boss ..