TLDRs;
- xAI hires Nvidia experts Zeeshan Patel and Ethan He to build advanced “world models” capable of understanding physical environments.
- The company’s new “Omni Team” will develop AI that processes and generates multimodal content, from images to video and sound.
- xAI faces compute challenges after ending a $10B deal with Oracle, opting to build its own Nvidia-powered infrastructure.
- Despite leadership shakeups, Musk’s startup remains committed to creating AI that interacts with the real world beyond text.
Elon Musk’s artificial intelligence startup, xAI, has reportedly intensified its efforts to develop next-generation AI systems known as “world models.”
These advanced systems aim to help AI understand, design, and interact with physical environments, a significant leap beyond the capabilities of existing large language models like OpenAI’s ChatGPT and xAI’s own chatbot, Grok.
According to sources familiar with the matter,, xAI has recruited two top Nvidia researchers, Zeeshan Patel and Ethan He, both with deep experience in training AI systems using video and robotics data. Their expertise aligns with Musk’s ambition to create AI that doesn’t just process text but perceives and responds to the real world in real time.
This new direction places xAI squarely in competition with tech giants like OpenAI and Google DeepMind, which are also exploring how AI can learn from 3D environments and multimodal data, integrating text, images, video, and physical simulations into cohesive understanding.
Building “World Models” with Nvidia Expertise
The term world models refers to AI systems that simulate the physical world to predict how actions will unfold, an essential foundation for robotics, autonomous vehicles, and intelligent agents capable of navigating reality.
Nvidia, where Patel and He previously worked, has been at the forefront of this field with its Omniverse and Isaac Sim platforms, simulation tools used to train and test robots in digital environments. By drawing from Nvidia’s simulation technologies, xAI hopes to replicate the complexity of real-world learning in virtual spaces, potentially accelerating development in robotics and real-time AI systems.
In line with this vision, xAI has formed what it calls an “Omni Team”, a specialized unit working on AI models that can process and generate content in images, video, and audio formats. This move signals Musk’s desire to build AI systems that can see, hear, and act, bridging the gap between digital intelligence and real-world capability.
Challenges in Compute Power and Scale
Despite its bold ambitions, xAI faces significant hurdles. One of the biggest challenges is compute capacity, the raw hardware power needed to train large-scale models.
The company reportedly ended discussions with Oracle over a proposed $10 billion cloud infrastructure deal due to disagreements over timing and energy supply. Instead, xAI is now working on building its own AI compute clusters using Nvidia H100 GPUs, the same chips that power most state-of-the-art AI systems today.
However, compared to industry leaders, xAI’s computing resources remain limited. OpenAI has stated that over one million GPUs will be online by the end of this year, and other firms like ByteDance are aggressively expanding their GPU clusters for similar large-scale AI training. For xAI, catching up will require not only talent but also massive infrastructure investment.