Nvidia shifts toward AI inference with Vera and Groq systems

Nvidia used its GTC 2026 keynote in San Jose to make the case that the next big phase of artificial intelligence spending will be driven less by training models and more by running them at scale. Chief executive Jensen Huang said the company now sees at least a $1 trillion revenue opportunity for its AI chips through 2027, a sharp increase from the $500 billion market opportunity it cited through 2026 on its February earnings call.

At the center of the pitch was Nvidia’s push into inference, the process of generating answers and carrying out tasks in real time. The company introduced its Vera CPU rack and expanded the Vera Rubin platform with a Groq-based inference accelerator system, positioning the combined architecture as infrastructure for agentic AI and large-scale deployment rather than model training alone.

Nvidia said the new platform brings together multiple chips and rack-scale systems designed to handle different phases of AI workloads, from pretraining and post-training to low-latency inference. The company said the Groq 3 LPX rack, paired with Vera Rubin systems, is aimed at improving throughput and power efficiency for demanding inference jobs, with availability slated for the second half of 2026.

The announcement comes as Nvidia tries to reassure investors that AI demand can continue rising even as competition intensifies from CPUs and custom processors built by large cloud and technology groups. By broadening its product lineup around full AI infrastructure, Nvidia is betting that the market is moving beyond experimentation and into a larger buildout of systems to serve hundreds of millions of users.