February 26, 2013 Leave a comment
When technology advances, it creates enormous value and benefits to allow the society to run much efficiently. Automation is one of them. From our daily life to manufacture, automation improves our productivity and allow us to do more thing with less time and more accurately. But it also has its side effect. Automation magnifies both positive and negative aspects when it’s hard, by design, for humans to intervene.
Considering the saw. While it takes you lots of time and effort to cut a log using a hand saw, you will feel much easier to use a chainsaw to do the job. Why? Because the chainsaw uses an engine efficiently passing the energy to the cutting chain which runs in a faster speed. The automation in the chainsaw improves the productivity of anyone who wants to achieve the job of cutting a log. But if you do not properly use it or have an accident, the damage the chainsaw can cause is much bigger than a handsaw can do.
This is the same in IT operations. Automation makes IT operations more efficient, but mistakes caused by humans and machines can easily cascade to do much more damage. The amazon storm happened in 2011 is a perfect example of this. The automated script for EBS mirroring is a really innocent process but acts as the catalyst for the outage storm. This is the nature of cloud, which is built on top of the massive automation.
How should you react to it? First, you should accept it. It happened in many public cloud providers. Chances are it will happen in your private cloud environment. The important thing you need to do is to quickly spot it and be able to stop the cascading before it causes a big damage. The operations management tool you choose for your automated environment should give you this edge. Using traditional way to monitor every resource supporting your cloud won’t cut it. It just gives too much data, to the extend that you won’t be able to grasp the true meaning of these data. Look out a tool that can give you the insight of your cloud environment without showing off itself meaninglessly with mass amounts of data that will bury you and your productivity.