TLDR
- Alibaba slashes GPU usage 82% with Aegaeon, fueling AI at massive scale.
- Aegaeon cuts AI model-switching latency by 97%, boosting performance.
- One Nvidia H20 GPU now runs 7 LLMs at once in Alibaba’s AI upgrade.
- Alibaba Cloud improves GPU efficiency with token-level auto-scaling.
- Aegaeon powers China’s AI goals while cutting reliance on Nvidia chips.
Alibaba Group Holding Limited closed at $167.05, marking a 1.19% increase, following a major breakthrough in AI infrastructure.
Alibaba Group Holding Limited, BABA
The company introduced a computing pooling solution that cut Nvidia GPU usage by 82% in model-serving operations. This advance positions Alibaba Cloud ahead in the race to optimize AI deployment at scale.
Aegaeon boosts efficiency, cuts GPU dependency
Alibaba Cloud, the cloud computing arm of the Hangzhou-based firm, implemented a new system called Aegaeon to boost AI efficiency. The solution allows a single Nvidia H20 GPU to serve up to seven large language models concurrently. This change reduced GPU usage from 1,192 to just 213 units during internal testing.
Aegaeon works by performing auto-scaling at the token level during model inference across concurrent AI workloads. This strategy enables dynamic resource reallocation, allowing the same GPU to switch between models mid-processing. It also cut latency in model-switching tasks by 97%.
The solution was beta-tested for over three months in Alibaba Cloud’s Bailian marketplace. It handled dozens of models with up to 72 billion parameters without service degradation. Aegaeon has now been formally deployed in Alibaba’s model marketplace, which serves its proprietary Qwen models.
Model market insights and performance optimization
Alibaba Cloud found that only a small number of models are frequently used in real-world AI tasks. Despite this, many GPUs were allocated to rarely called models, resulting in low resource utilization. Data showed that 17.7% of GPUs served just 1.35% of total inference requests.
With Aegaeon, the company resolved this imbalance through pooling and smart scaling strategies. The system ensured consistent GPU usage and prevented idle processing across rarely used models. Alibaba achieved higher throughput and improved hardware efficiency for enterprise deployments.
Peking University and Alibaba Cloud researchers co-authored a technical paper detailing the innovation, presented at SOSP 2025 in South Korea. The study underlined that serving concurrent workloads with traditional GPU methods incurred unnecessary costs. This breakthrough directly supports China’s goal of AI infrastructure modernization under resource constraints.
Nvidia’s role and China’s chip strategy shift
Nvidia developed the H20 GPU specifically for AI inference in China, complying with U.S. export restrictions. However, Chinese regulators recently launched a probe into possible backdoor security vulnerabilities in the chip. This scrutiny has affected the chip’s market position and adoption within China.
Chinese firms like Huawei and Cambricon are accelerating development of domestic GPUs to reduce foreign dependency. Nvidia’s CEO stated that the company’s market share for advanced AI chips in China has fallen to zero. This trend pushes local players to innovate and localize AI hardware supply chains.
Alibaba’s new approach strengthens its market stance while aligning with national strategies for tech self-sufficiency. By reducing reliance on U.S. chips, Alibaba gains a stronger foothold in China’s evolving AI ecosystem. The stock rise reflects confidence in its technology-led cost savings and scalability.