This article highlights some of the topics covered in the upcoming new version of the “3D Graphics for Virtual Desktop Smackdown” whitepaper authored by Team Remote Graphics Experts (@TeamRGE). Stay tuned for the announcement of the release date.
Currently NVIDIA, AMD and Intel are the most relevant graphics processor (GPU) manufacturers in the world. When I visited VMworld 2015 in San Francisco last week, all three vendors announced and demonstrated their newest GPU products and technologies that are designed to accelerate graphics and multimedia in remote user sessions. The most remarkable aspect of the individual announcements is that each vendor has found a very unique way to implement GPU-accelerated remoting. This article gives you a brief overview of what I’ve learned when talking to product managers and engineers of the three GPU vendors.
Today, there are three commonly known GPU virtualization technologies available. GPU pass-through is the term used for assigning individual GPUs on a hypervisor platform to selected virtual machines. As a result, this establishes a 1-on-1 relationship between a GPU and a VM with a driver for the guest operating system provided by the GPU vendor. GPU sharing, also referred to as para-virtualization, uses a synthetic driver provided by the integration services of the hypervisor host and installed in each guest’s operating system. This driver filters graphics API calls and redirects them to the physical GPU in the host system, making it a shared resource for all guests. The third option, called GRID vGPU or GPU virtualization in general, was first introduced by NVIDIA.
The NVIDIA GRID technology is based on GRID GPU cards, a complete software stack of GPU virtualization, remoting and session-management libraries in the hypervisor, and a GRID-aware driver in the guests. In essence, NVIDIA GRID implements a GPU sharing concept that allows full control of memory resources assigned to a pre-defined number of VMs, including certified graphics drivers.
The first generation of NVIDIA GRID introduced K1 and K2 cards with “Kepler” GPUs and it marked a milestone in delivering high-end graphics in VDI. The GRID K1 card has four NVIDIA Kepler GPUs with a total of 768 CUDA cores and 16GB of DDR3 RAM. The GRID K2 card has 8GB of GDDR5 RAM and two GPUs with a total of 3,072 CUDA cores. The GRID vGPU technology is implemented in such a way that it allows multiple resource profiles. A profile defines the number of virtual machines that are assigned to a physical GPU and the amount of video memory available for each VM. The first version of the GRID vGPU technology supports up to eight virtual machines per physical GPU, which equals a maximum of 32 users on the quad GPU GRID K1 board and 16 users on a dual GPU GRID K2. Depending on the GPU card and the profile selected, the GRID vGPU v1 price per user is between approx. $2,000 for designers and less than $100 for knowledge workers.
The new GRID v2 introduced by NVIDIA CEO Jen-Hsun Huang at VMworld 2015 has three different editions, but still follows the same general GRID philosophy. Like its predecessor, GRID v2 is based on high-end GPU cards, a software layer in the hypervisor and a (certified) graphics driver for the guests. The new GRID solution runs on top of NVIDIA “Maxwell” GPUs that come in two server form factors. The TESLA M6 bare board has one NVIDIA Maxwell GPU, 1,536 CUDA cores and 8 GB of GDDR5 RAM while the dual GPU TESLA M60 card has 4,096 CUDA cores and 16 GB of GDDR5 RAM. GRID v2 enables up to 16 users to share each physical GPU, so like in the previous GRID version the graphics resources of the available GPUs can be assigned to virtual machines in a balanced way. What’s new is that NVIDIA wants their OEM partners to charge customers a license fee for the new GRID graphics software stack including the GPU broker and the certified driver. The final price per user license will be announced by NVIDIA partners in the near future. The total price for the GRID product is then determined by the number of users/VMs, the selected GRID vGPU profiles and the price for the physical GPU cards.
Also at VMworld 2015, AMD demonstrated a new hardware-based GPU virtualization solution, named AMD Multiuser GPU. This solution is still in development and it is built around the industry standard SR-IOV (Single Root I/O Virtualization) technology that allows a single PCIe device to appear to be multiple separate PCIe devices. So in essence, it provides a way for physical devices to expose hardware virtualization. The SR-IOV technology is not new, it is also used to assign physical network cards to virtual machines. The AMD implementation of SR-IOV for Multiuser GPU is baked into the physical graphics card and requires support in the server BIOS. The result is a virtualized workstation-class experience with full ISV certifications. Graphics cards supporting AMD Multiuser GPU are scheduled to be released later this year.
AMD Multiuser GPU has only minimal impact on the hypervisor as it does not require any software stack on the host side that goes beyond a fairly simple driver. This driver sees up to 16 virtual GPUs per physical GPU, with device virtualization and GPU resource management purely implemented in hardware. It is important that the BIOS is capable of dealing with all these “new” virtual GPU devices that cannot be distinguished from physical GPUs. Among the most important requirements is 40-bit memory addressing as the memory range must support the reservation of memory for these additional GPU devices on the PCIe bus.
The Multiuser GPU concept has a lot in common with GPU pass-through as each hardware-virtualized GPU has a 1-on-1 relationship to a virtual machine. In the current implementation demonstrated at VMworld, AMD Multiuser GPU supports VMware vSphere/ESXi 5.5/6.x and the remoting protocols coming with Horizon View, Citrix XenDesktop and Teradici Workstation Host Software. The native AMD driver installed in each guest operating system supports OpenGL, DirectX and OpenCL on the hardware-virtualized GPU. Future support of other hypervisors should also be possible.
An interesting implementation detail is that one user cannot affect another user’s GPU performance as GPU compute resources and video RAM are exclusively assigned to individual VMs. In other words, GPU compute cycles are not shared, each cycle is “owned” by one particular VM. Later this year, AMD is planning to release different single GPU and dual GPU cards that are the foundation of Multiuser GPU. Unfortunately, only newer server hardware will be compatible to the way AMD Multiuser GPU implements SR-IOV, which eliminates many existing standard server platforms from this technology. More details about compatible servers and required BIOS specifications will be published by AMD in the near future.
Intel offers integrated processor graphics technology in their Xeon E3 line of CPUs. In September 2013, Intel announced their “Haswell” CPUs with four models of integrated GPUs. The high-end model is called GT3e or Iris Pro. RAM is shared dynamically between GPU and CPU, allowing for high data transfer rates. Such a CPU/GPU combo is commonly referred to as an Accelerated Processing Unit (APU, also Advanced Processing Unit) which combines the CPU with additional processing capability on the same die, specifically designed to accelerate certain types of computations.
Only three months after the Haswell announcement, Intel also announced a technology called GVT (Graphics Virtualization Technology) as a GPU virtualization solution with mediated graphics pass-through. Intel GVT is able to combine Xen hypervisor and Iris Pro GPU cores (GT3e), which is quite similar to what NVIDIA GRID vGPU does. A virtual GPU is assigned to a guest VM, with resource assignment brokered and managed by a “mediator” component running in the Dom0 session of the hypervisor. Controlled by the mediator, Intel’s native graphics driver runs inside each guest VM, communicating directly with the assigned GPU cores and preventing hypervisor intervention in performance critical paths.
In summer 2015, Intel started shipping the new generation “Broadwell” APU, which is an Intel Xeon E3-1200 v4 processor with Iris Pro Graphics P6300. This integrated GPU comes with 48 execution units (cores) with 128MB of eDRAM. It can deliver up to 1.8 times the 3D graphics performance of the previous generation Intel Xeon E3-1200 v3 with Intel HD graphics. The memory interface of the Xeon E3-1200 v4 Iris Pro graphics unit allows to address up to 32GB DDR3(L)-RAM used as vRAM. At VMworld 2015, Intel demonstrated this APU in combination with VMware vSphere and Horizon View, delivering GPU-accelerated user sessions.
The design of the Intel E3-1200 v4 with integrated Iris Pro graphics unit is geared towards low to medium-end remote graphics workstation use cases. It is a low-power and low-cost solution for customers who want to keep their critical data in centralized datacenters and deliver rich 3D applications over local and wide-area networks. According to Intel, their GVT solution allows Xeon E3-1200 v4 customers to dedicate the resources of each processor to a single designer or share them among multiple less demanding users. The power consumption of a Xeon E3-1200 v4 CPU with Iris Pro graphics unit is between 35 and 95 Watts. The “Skylake” line of CPUs planned for later this year with Intel HD Graphics 530 as the first released product derived from an instance of the Intel processor graphics gen9 architecture. According to announcements recently made by Intel such a CPU may come with up to three slices with 24 Iris Pro GPU cores each, providing a total of 72 GPU cores.
GPU-accelerated session remoting and improved user experience are hot topics, and the huge investments made by the big three GPU vendors shows that they are fully aware of it. NVIDIA is coming from the absolute high-end with their GRID v2 solution while Intel’s Iris Pro graphics unit is coming from the low-end side of graphics remoting. Both have the ambition to expand their market share as fast as they can and deliver GPU-accelerated remoting to the masses. NVIDIA’s advantages are their extremely powerful GPUs and their well-established GRID concept. Intel’s unique selling points are low power consumption and a low price. AMD is the challenger and they are still trying to find their sweet spot in this market. Their capability to deliver their Multiuser GPU hardware virtualization solution as soon as possible will be the critical success factor. It’s an exciting race between the three and I’m looking forward to what the next weeks and months will bring for GPU-accelerated remoting. My prediction is that in three years the majority of remote sessions will be GPU-accelerated.