2025-10-20
South China Morning Post
4 related
Alibaba Cloud details a GPU pooling system that it claims reduced the number of Nvidia H20s required by 82% when serving dozens of LLMs of up to 72B parameters
up to 9x increase in output lets 213 GPUs perform like 1,192 ACM Digital Library : Aegaeon: Effective GPU Pooling for Concurrent LLM Serving on the Market Rounak Jain / Benzinga : Alibaba Cloud's New ...
Loading articles...