When you drive a car, you will want to know how fast you are driving, how many fuel is left, and whether your engine is normal. That’s where the dashboard plays an important role. The same is true for your cloud operations. In order for a cloud administrator to know whether the cloud is running fine, he/she needs accurate and meaningful operational indicators. There are four indicators that are very essential when you want to get a complete picture of your cloud operations. You need these indicators in each level – device, pools, and services – of your cloud environment.
Just like the speedometer tells you how fast or slow your car is running, you need an indicator to tell you how fast and slow your cloud resources, including compute, storage, and network, are performing. This includes two types. One is to measure utilization or workload, which reflects the snapshot of the current usage of total capacity. For examples, CPU and memory utilization of a compute resource pool belong to this category. Another type is to measure the “performance”, fast or slow, of the resource, such as disk I/O or network I/O. In the cloud, you need to find an indicator not only measure workload and performance in the VM or storage array level but also in the resource pools, pod, or even tenant level.
This indicator tells you how are cloud resources up and available based on the condition of your operations goal. For service provider, the service level agreement (SLA) is a key part of their operations element. For private cloud, often, the IT department also has SLA with business units. To be able to measure accurately the cloud service level target is crucial in both private and public cloud.
While the speedometer lets you act in real-time, pressing the pedal or the brake, the fuel gauge tells you how far you can go and whether you need to fill your gas tank in the next stop. In the cloud, not only you need to know how much workload you have now, you need to prepare how much you will have to add to meet the demand. This is what a good capacity indicator will tell you.
Even you car is running fine now and you have plenty of fuel, you won’t be happy if one of tire suddenly is broken or your engine is suddenly dead in the middle of the road. That’s why there are sensors and warning indicators in your dashboard, such as low tire pressure or check engine light. You want to have an early warning so you can have it checked out before you go on the road. You need the same thing in the cloud operations. A good health status (itself is worth a separate article) can take the consideration of all the external events , behavior events (based on intelligent baseline and threshold), capacity situation, and workload/demand anticipation to give you an accurate and predictive status indicator of your cloud health.
One would argue that many tools today can provide some of those indicators already. But you can’t take these indicators separately. All of them are related to each other. Any single of them reflects a part but not the whole picture of your operations status. The following questions can help you identify the right solutions you can rely on.
- Can I get a single solution to show all these four indicators holistically?
- Are these indicator measuring the most important part of the cloud context – pools and services?
- How can these indicators reflect accurately the dynamics in the cloud?
- Can I get a predictive status to release me from act reactively?