Digging into the Ultra Accelerator Link Consortium

The newly formed UALink Consortium brings together leading technology companies to address the vital issue … [+] The technical challenge of GPU-to-GPU connectivity in data centers.

getty

The Ultra Accelerator Link Consortium has recently been incorporated, giving companies the opportunity to join, and has announced that the UALink 1.0 specification will be available for public consumption in the first quarter of 2025. Included in the consortium are its members “Promoters”, including AMD, Astera Labs, AWS, Cisco, Google, HPE, Intel, Meta and Microsoft.

The UALink Consortium aims to provide specifications and standards that allow industry players to develop high-speed interconnects for AI accelerators at scale. In other words, it addresses the GPU clusters that train the largest language models and solve the most complex challenges. Just as Nvidia developed its proprietary NVLink to address GPU-to-GPU connectivity, UALink looks set to extend this capability across the industry.

The key to the UALink Consortium is the partnership between the largest technology companies—many of which compete with each other—to better enable the future of AI and other workloads that depend on accelerators. Let’s examine this initiative and what it could mean for the market.

How we got here – The CPU Challenge

High Performance Computing was perhaps the first workload classification to highlight that CPUs were not always the best processor for the job. The massive parallelism and high data throughput of GPUs enable tasks such as deep learning, genomic sequencing, and big data analytics to perform much better than on a CPU. These architectural changes and programmability have made GPUs the accelerator of choice for AI. In particular, training LLMs that double in size every six months or so happens much more efficiently and much faster on GPUs.

However, in a server architecture, the CPU (emphasis on the “C”-central) is the brain of the server, with all functions running through it. If a GPU is to be used for a function, it is connected to a CPU via PCIe. Regardless of how fast that GPU can perform a function, system performance is limited by how fast a CPU can route traffic to and from it. This limitation becomes glaringly obvious as LLMs and datasets become increasingly large, requiring a large number of GPUs to train them cooperatively in the case of generative AI. This is especially true for hyperscalers and other large organizations that train frontier AI models. Consider a training set with thousands of GPUs spread across several stacks, all dedicated to training GPT-4, Mistral, or Gemini 1.5. The amount of lag introduced in the training period is considerable.

However, this is not just a matter of training. As enterprise IT organizations begin to operationalize generative AI, performing inference at scale is also challenging. In the case of AI and other demanding workloads such as HPC, the CPU can significantly limit system and cluster performance. This can have many implications in terms of performance, cost and accuracy.

Introducing UALlink

The UALink Consortium was formed to develop a set of standards that enable accelerators to communicate with each other (bypassing the CPU) in a fast, low-latency — and scalable — way. The specification defines an I/O architecture that enables speeds of up to 200 Gbps (per lane), scaling up to 1024 AI accelerators. This specification offers much better performance than Ethernet and connects many more GPUs than Nvidia’s NVLink.

To better contextualize UALink and its value, think of the link in three ways: the front-end network, the growth network, and the scale-out network. In general, front-end networking is focused on connecting hosts to the wider data center network for connecting to compute and storage arrays, as well as the outside world. This network is connected via Ethernet NICs to the CPU. The latter network is focused on GPU-to-GPU connectivity. This support network consists of two components: the scaled fabric and the scaled fabric. Scaling connects hundreds of GPUs with the lowest latency and highest bandwidth (where UALink comes in). Scale-out is for scaling AI clusters beyond 1,024 GPUs to 10,000 or 100,000. This is enabled by using scalable NICs and Ethernet and is where Ultra Ethernet will come into play.

When you think about a product like the Dell PowerEdge XE9680, which can support up to eight AMD Instinct or Nvidia HGX GPUs, a UALink-enabled cluster would support more than 100 of these servers in a pod where the GPUs would had direct and low-latency access. towards each other.

As an organization’s needs grow, Ultra Ethernet Consortium-based connections can be used for expansion. In 2023, industry leaders including Broadcom, AMD, Intel and Arista formed the UEC to drive performance, scale and interoperability for bandwidth-hungry AI and HPC workloads. In fact, AMD just launched the first UEC-compliant NIC, the Pensando Pollara 400, a few weeks ago. (Our Moor Insights & Strategy colleague Will Townsend has written about it in detail.)

Returning to UALink, it is important to understand that this is not just a pseudo-standard being used to challenge the dominance of Nvidia and NVLink. This is a real working group developing a real standard with actual solutions being designed.

In parallel, we see some of the groundwork being laid by UALink Promoter companies like Astera Labs, which recently introduced its Scorpio P-Series and X-Series fabric switches. While the P Series switch enables GPU-to-CPU connectivity over PCIe Gen 6 (which can be customized), the X Series is a switch aimed at GPU-to-GPU connectivity. Given that the company has already built the basic structure, it can be seen how it can support UALink shortly after the specification is published.

It is important to understand that UALink is agnostic to accelerators and fabrics, switches, retimers, and other technologies that enable accelerator-to-accelerator connectivity. It doesn’t favor AMD over Nvidia, nor does it favor Astera Labs over, say, Broadcom (if that company decides to contribute). It’s about building an open set of standards that favors innovation across the ecosystem.

While the average enterprise IT administrator, or even the CIO, won’t care much about UALink, they will care about what it will deliver to their organization: faster training and bottom-up on less power-hungry platforms energy and can be managed and tuned somewhat on their own. . Putting a finer point on it – faster results at lower cost.

What about Nvidia and NVLink?

It’s easy to see what UALink is doing as an attempt to respond to Nvidia’s stronghold. And on some level, it certainly is. However, in the bigger picture, this is less about copying what Nvidia does and more about ensuring that critical capabilities like GPU-to-GPU connectivity don’t fall under the purview of one company with a vested interest. to optimize for its own GPUs.

It will be interesting to see how server vendors like Dell, HPE, Lenovo and others choose to support UALink and NVLink. (Lenovo is a “Contributing” member of the UALink Consortium, but Dell has not yet joined.) NVLink uses a proprietary signaling interconnect to support Nvidia GPUs. Alternatively, UALink will support accelerators from a variety of vendors, with switching and fabric from any vendor adhering to the UALink standard.

There is a real and significant cost to these server vendors – from design to manufacturing and through the qualification and sales/support process. On the surface, it’s easy to see where UALink would appeal to, say, Dell or HPE. However, there is a market demand for Nvidia that cannot and will not be ignored. Regardless of one’s perspective on the ability of the “market” to erode Nvidia’s dominance, we can all agree that its dominance is not going to fade anytime soon.

Collaborating for better data center computing

The UALink Consortium (and future specification) is an important milestone for the industry as the challenges associated with training AI models and operationalizing data become increasingly complex, time-consuming and costly.

If and when we see companies like Astera Labs and others develop the underlying fabric and silicon switching to drive accelerator-to-accelerator connectivity, and when companies like Dell and HPE build platforms that light all of this, the trickle-down impact will be important. This means that the benefits realized by hyperscalers like AWS and Meta will also benefit enterprise IT organizations looking to operationalize AI across business functions.

Ideally, we would have a market with a standard interconnect specification for all accelerators – all GPUs. And maybe at some point that day will come. But for now, it’s nice to see rivals like AMD and Intel or Google and AWS come together around a standard that’s good for everyone.

How we got here – The CPU Challenge

Introducing UALlink

What about Nvidia and NVLink?

Collaborating for better data center computing

Leave a Comment Cancel reply