The world of artificial intelligence and machine learning
On December 30, 2016, an online player with the screen name “Master” entered a website where people play the Chinese game of Go online. Among those online players were several world top professional Go players, including several world champions. After “Master” had won several games straight with mid-ranked players, it drew attentions from the elite players. But the winning streak continued. Finally, champions and top players had to jump in to challenge “Master.” Five days later, “Master” finished its 60th games at an astonishing result – sixty wins and zero loss.
“Master” is not a human player. It is Google DeepMind’s artificial intelligence software AlphaGo. A month later, another AI software, Carnegie Mellon University’s Libratus program, beat four top Poker pro players in an exhausting 120,000 game match.
People have considered Go and Poker the last games humankind have the edges against machines. According to Wikipedia, “despite its relatively simple rules, Go is very complex, even more so than chess, and possesses more possibilities than the total number of atoms in the visible universe.” On the other hand, Poker is a game of uncertainty and requires players to understand and take advantage of other players’ psychology. Yet, they both fell to the machines. “AI” or “machine learning” has become a leading topic in the technology world. Major technology companies, including Google, Microsoft, Amazon, and Facebook, are investing heavily in this technology.
Machine learning and IT management
We can compete with machines. But it is more exciting when we leverage machines to make our lives easier. The fundamental advantage that a machine has is its speed of calculations. With speed, a machine can complete repetitive tasks or analyze a large quantity of data faster than a human can do.
How can we leverage the machine to help us achieve a better IT management?
Today’s IT data centers have many moving components. Virtualization and cloud computing give the applications the ability to maximize the usage of infrastructure resources without being tied to a static hardware box. The application can use a very different combination of the network, the storage, and the compute at one time, compared to an earlier time. The relationship between infrastructure and application is dynamic. In addition, the virtualization and cloud computing infrastructure generate a large volume of data, at an astonishing speed.
Compared to a decade ago, IT staff are under more pressure today. They are facing this huge amount of data and the evolving relationship between apps and the infrastructure. IT staff need help to sort out signals from the noise in real time and keep IT services running to meet the ever growing business appetites.
That’s where the machine learning can offer great values in the IT management. The machine learning can sift through a large quantity of data in a matter of seconds, detect patterns, and extract insights to help IT make correct decisions and keep a high service quality. Guesswork is no longer needed. The days of tedious data collection and analyzing are gone. IT staff can focus on high-value tasks to help their business grow.
One size can not fit all
Often, a tool repackages the linear regression (for trending) or standard deviation (for range), and claim it has the machine learning. Nice try. Only if it can indeed produce the result that can truly help us. Often than not, it can’t. There are two reasons. First, the machine data generated in IT environment are not random. They don’t have a normal distribution (the bell curve). Mathematically, you can’t do a linear regression if the distribution of the data is not normal. The second reason is that the machine data live in a context. For example, VMs can share the disk space from a shared virtual storage pool. The space each VM can occupy depends on how other VMs consume the storage in the same pool. You have to look into the data from the VM and the pool to figure out the true trend or normal usage.
Is there a perfect algorithm that can do this? Unfortunately, the answer is no. Data from different types of application behaves differently. For example, the performance data in a VDI desktop differs greatly from a database server VM. Even within the same VM, the CPU and memory typically are more volatile than the disk space data.
So, what is a right approach?
Nutanix has developed technology, X-Fit, to bring benefits of the machine learning into the IT management. The idea is simple. We let algorithms compete themselves to ensure the machine, Prism, develops its deep understanding of the workload it is analyzing. We hand picked a bag of algorithms and optimized them so they can run in parallel without consuming too many resources. Those algorithms go through a tournament. We designed an algorithm to let Prism evaluate the results and select the winner for each competition. Just like the evolution, the winner coming out of those tournaments naturally has the best understanding of the workload behavior it is targeting.
Then, we bring the context of the data into the picture. X-Fit has the knowledge to interpret the context of the data. In the example mentioned earlier, X-Fit links the storage pool behavior with the one in each storage container. This allows X-Fit discovers the insight that is meaningful and actionable to the IT staff. You can check more technical details in a blog written by the X-Fit team.
Nutanix has shipped its first application of X-Fit in Prism in 2016. In this first application, X-Fit looks into the long-term capacity consumption behavior of the infrastructure. It uses this understanding to predict the capacity runway of the infrastructure to support its workload. I will show you the details in my next blog.
Disclaimer: This blog is personal and reflects the opinions of the author, not necessarily those of Nutanix. It has not been reviewed nor approved by Nutanix.