CS Student’s Research Leads to Technology Adopted by Microsoft
02-24-2014
Writer(s): Jesica Hollinger
Rahul Potharaju, PhD student in the Department of Computer Science made a discovery during a summer internship with Microsoft Research which led to insights currently being leveraged to improve reliability across Microsoft data centers.
The holy grail of computing is to make computers understand natural text and extract knowledge from it in a clear and meaningful way. Watson is a prime example of an artificial intelligence system capable of answering questions posed in natural language, developed by IBM and used to match wits against human contestants on the quiz show, Jeopardy.
Attempting to create a system capable of universally answering any question from human-written free-form text, such as network tickets and security incident reports is a difficult task. NetSieve adopts a domain-specific approach to first learn useful knowledge from existing and unstructured data, and then uses it to automatically gain key business insights from new data.
Potharaju commented that his discovery for NetSieve came unexpectedly, during his internship.
"While I initially explored approaches from natural language processing to solve this problem, I found them to be very inaccurate. In part, because the techniques were designed to process well-written text—like news articles—but not text which is free-form and ambiguous,” Potharaju said.
Trouble tickets, Potharaju explained, contain domain-specific words and synonyms mixed with regular dictionary words, spelling and grammar errors, and writing from different operators. He conceptualized building a dictionary for this domain of trouble tickets, which can then be used to understand the meaning of a ticket.
Potharaju acknowledged his research has greatly benefitted from the instruction he received from his mentor Navendu Jain at Microsoft Research and his PhD advisor, Professor Cristina Nita-Rotaru. Under their guidance, he built his thesis around solving this problem and creating a technology that would combine statistical natural language processing (NLP), knowledge representation, and ontology modeling to achieve these goals.
As his advisor, Nita-Rotaru reflected that Potharaju was exceptional in his aptitude as a graduate student.
“What distinguishes Rahul from other students is his ability to see the big picture and identify meaningful research problems. He navigates between networking, security, and machine learning concepts with an easiness that speaks to his ability to learn new concepts and think outside of the box,” said Nita-Rotaru.
The creative process gave Rahul valuable insights into problem solving, which have served him well in academia, and potentially as a young entrepreneur. He credits his data-driven approach to research, which helped him design a "filter" to process data into something meaningful, resulting in valuable insights.
NetSieve has the potential to reduce downtime in business and industry, while saving millions of dollars and protecting business reputations.
Findings resulting from this research have been published in Networked Systems Design and Implementation USENIX 2013, a prestigious academic venue.