Cars drive themselves. Planes have auto-pilots. Why do we still need people to run networks?

Share Post

In the last few years, there have been big breakthroughs in the use of artificial intelligence (AI) for solving problems in image and speech recognition, natural language processing, and robotics. These breakthroughs have been enabled by massive increases in computing power (processing, memory, I/O) as well as algorithmic advances. Many familiar consumer services - Amazon's Alexa and Apple's Siri personal assistants, Facebook's automated image recognition, Google's online language translation and Tesla's self-driving car mode - are all powered by AI engines.

Automation has always been a major priority in the networking industry. However, networking equipment and software vendors have fallen behind in leveraging state-of-the-art AI techniques for network design and operations. Consequently, networks suffer from all the familiar problems despite IT teams using a dizzying array of dashboards to run their network infrastructure. There is no advance warning of problems and when there is a problem, there's a fire-drill with IT teams scrambling to gather relevant data and analyze it, often leading to the conclusion that the problem had nothing to do with the network infrastructure. For the same reason, any major configuration change or firmware upgrade is a painful procedure – if something goes wrong, there is often little visibility until users complain of an outage or impaired service. The transition of large enterprise applications, such as Microsoft Office, to the cloud, complicates things further since traffic loads can shift rapidly and unpredictably. Finally, there is the looming threat of tens of millions of IoT devices joining the network. Are we on the brink of an abyss?


Luckily, it turns out that there are emerging "network analytics" solutions that that can help. Today's networks generate vast amounts of data from embedded instrumentation in switches, routers, servers and endpoints. In a typical large public university in the US, there are ~1 terabytes of data generated daily. It is possible to reuse "big data" computing and database technologies, developed for Web-based services, to process even this vast amount of data in real-time. Similarly, machine intelligence techniques such as clustering, automated model building and neural networks can be used to characterize the behavior of the network and the experience of its users. This helps provide visibility into network malfunctions and their impact on the user experience. Using these models, it is also possible to compare network performance and user experience across networks. We have found this to be an extremely powerful capability – it provides the ability to predict problems before they happen, create recommendations for resolutions, and suggest better network configurations.

Machine learning and data science techniques are very powerful, but the key to applying them and using them judiciously for networking applications is human intelligence. There is an enormous amount of tribal knowledge that is present within the networking community. In our experience, solving real-world problems requires the coupling of this tribal domain-knowledge with the building blocks that machine learning and data science provide. At Aruba, we have been working on developing network analytics solutions based on these ideas and we have some very promising initial results. In the next installment of this blog, we will share more of our learnings from our initial large-scale field trials.