By now, it is clear that no matter who wins the AI race, the biggest profiteer is NVIDIA. It’s common knowledge that the company is a market leader in the hardware category with its GPUs being widely used by all AI-focused companies in the world. That’s not all. NVIDIA, the biggest chip company in the world, is leading the battle from the software side of things as well, with its CUDA (Computing Unified Device Architecture) software.
CUDA, in essence, is like the magic wand that connects software to NVIDIA GPUs. It’s the handshake that enables your AI algorithms to work with the computing power of these graphical beasts. But to NVIDIA’s advantage, CUDA isn’t just any ordinary enchantment, but a closed-source, low-level API that wraps the software around NVIDIA’s GPUs, creating an ecosystem for parallel computing. It’s so potent that even the most formidable competitors such as AMD and Intel struggle to match its finesse.
While other contenders such as Intel and AMD attempt to juggle one or the other, NVIDIA has mastered the art of both. Their GPUs are sleek, powerful, and coveted – and it’s no coincidence that they’ve also laid down the foundations of software that make the most of these machines.
Software companies can’t just waltz in and claim the crown to replace NVIDIA, they lack the hardware prowess. On the flip side, hardware manufacturers can’t wade into the software territory without struggling. This has made CUDA the winning ingredient for NVIDIA in AI.
Undisputed but vulnerable
NVIDIA built CUDA in 2006 with parallel computing for processing on multiple GPUs simultaneously. Earlier, developers were using models like Microsoft’s Direct3D or Linux’s OpenGL for computational purposes on GPUs, but lacked parallel computing capabilities. After the launch of CUDA, businesses began tailoring their strategies to adopt the software. OpenCL by Khronos Group was the only potential competitor released in 2009. But by then all companies had already started leveraging CUDA, leaving no room or need for it.
NVIDIA’s current strategy sounds all great, but there are some major drawbacks in it as well. Though CUDA is a moat for NVIDIA, the company’s pursuit of an upmarket strategy, focusing on high-priced data centre offerings, might let other companies be able to catch up with their software.
Moreover, the market is rife with a GPU shortage that feels almost mythical, but a few are willing to forsake NVIDIA’s wares for alternatives like AMD or Intel. It’s almost as if tech aficionados would rather gnaw on cardboard than consider a GPU from another company.
NVIDIA’s maintenance of its current dominance is rooted in removing the RAM constraints within its consumer grade GPUs. This situation is likely to change as necessity drives the development of software that efficiently exploits consumer-grade GPUs, potentially aided by open-source solutions or offerings from competitors like AMD and Intel.
Both Intel and AMD stand a chance at challenging NVIDIA’s supremacy, provided they shift away from mimicking NVIDIA’s high-end approach and instead focus on delivering potent, yet cost-effective GPUs, and build open source solutions. Crucially, they should differentiate themselves by avoiding artificial constraints that limit GPU capabilities, which NVIDIA employs to steer users towards their pricier data centre GPUs.
Even after these existing constraints, a lot of developers choose NVIDIA’s consumer grade GPUs over Intel or AMD for ML development. A lot of recent development in these smaller GPUs has led to people shifting to them for deploying models.
There is another competitor coming up
Interestingly, OpenAI’s Triton emerges as a disruptive force against NVIDIA’s closed-source stronghold with CUDA. Triton, taking Meta’s PyTorch 2.0 input via PyTorch Inductor, carves a path by sidestepping NVIDIA’s CUDA libraries and favouring open-source alternatives like CUTLASS.
While CUDA is an accelerated computing mainstay, Triton broadens the horizon. It bridges languages, enabling high-level ones to match the performance of lower-level counterparts. Triton’s legible kernels empower ML researchers, automating memory management and scheduling while proving invaluable for complex operations like Flash Attention.
Triton is currently only being powered on NVIDIA GPUs, the open-source reach might soon extend beyond, marking the advent of a shift. Numerous hardware vendors are set to join the Triton ecosystem, reducing the effort needed to compile for new hardware.
NVIDIA, with all its might, overlooked a critical aspect – usability. This oversight allowed OpenAI and Meta to craft a portable software stack for various hardware, questioning why NVIDIA didn’t simplify CUDA for ML researchers. The absence of their hand in initiatives like Flash Attention raises eyebrows.
NVIDIA has indeed had the upper hand when it comes to product supremacy. But let’s not underestimate the giants of tech. Cloud providers have rolled up their sleeves, designing their own chips that could give NVIDIA’s GPUs a run for their transistors.
Still, all of this is just wishful thinking as of now.