Purdue Researchers Accelerate Virtual Machine Communication in the Cloud
11-04-2010
A group of researchers in the Department of Computer Science at Purdue University, including graduate students Ardalan Kangarlou, Sahan Gamage, and Professors Ramana Kompella and Dongyan Xu, have made an interesting (and, perhaps disturbing!) observation that server consolidation - arguably the key underpinning of cloud computing platforms--- actually has a negative impact on the inter-VM network performance in cloud environments. This happens because sub-millisecond data center network latencies are dominated by scheduling latencies across VMs that share the same CPU. As the overall round-trip time (RTT) increases, TCP connections progress much slower to adjust to the right available bandwidth thus resulting in a loss of throughput. To overcome this problem, they propose a new approach called vSnoop, which mitigates the negative impact by accelerating TCP connections seamlessly within the hypervisor without sacrificing the flexibility, scalability, and economy of virtual machine hosting in the cloud. Their paper describing the design of vSnoop and the prototype implementation over Xen virtual machine platform will appear in the proceedings of the 2010 ACM/IEEE Supercomputing Conference. In recognition of the importance and potential impact of this research, it has been nominated as one of the five Best Student Paper Award Finalists. The winner will be announced during the conference that will take place on November 13-19 in New Orleans, Louisiana.
Virtualization is one of the key enabling technologies behind the emerging cloud computing platforms and services such as Amazon’s EC2. In a typical virtualization-based cloud infrastructure, each physical server hosts multiple virtual machines (VMs) that execute applications and services for cloud customers. Such a practice, called VM consolidation, improves resource utilization and scalability of the cloud infrastructure. VM consolidation necessitates the sharing of the same CPU by multiple VMs. However, as more virtual machines are scheduled to access the same core/CPU, the CPU access latency for each virtual machine (i.e. the interval during which a virtual machine waits for the CPU) increases. Such increase in turn raises the round-trip-time of a TCP connection to the virtual machine and adversely affects TCP throughput for those connections.
To alleviate the impact of virtual machine consolidation, the Purdue researchers propose the vSnoop approach to improve the throughput of TCP connections to consolidated VMs. The key idea behind vSnoop is to allow the driver domain of a physical host to acknowledge TCP packets on behalf of the less privileged production virtual machines – whenever it is safe to do so. By offloading acknowledgement to the driver domain, vSnoop masks the portion of a TCP packet’s round-trip-time that corresponds to virtual machine scheduling. The reduction in round-trip-time prompts the sender to transmit to the virtual machine at a higher rate, effectively saturating the link between the sender and the receiving virtual machine.