OpenAI partners with Microsoft, AMD, Broadcom, Nvidia, and Intel researchers to detail the Multipath Reliable Connection (MRC) protocol to help scale compute
OpenAI is getting creative to deal with the industry's imminent compute crunch. … The protocol, which has been in the works for two years …
The Deep View Nat Rubio-Licht
Related Coverage
- Supercomputer networking to accelerate large scale AI training OpenAI
- From Innovation to Deployment Ready: AMD Advances AI Networking at Scale with MRC AMD
- NVIDIA Spectrum-X — the Open, AI-Native Ethernet Fabric — Sets the Standard for Gigascale AI, Now With MRC NVIDIA · Gilad Shainer
- NVIDIA Spectrum-X MRC is the Custom RDMA Transport Protocol for Gigascale AI ServeTheHome · Patrick Kennedy
- OpenAI Pushes New AI Networking Protocol as GPU Clusters Scale Data Center Knowledge · Shane Snider
- Nvidia's MRC: When ‘just Ethernet’ isn't enough for gigascale AI SiliconANGLE · Zeus Kerravala
- OpenAI built a networking protocol with AMD, Broadcom, Intel, Microsoft, and NVIDIA to fix AI supercomputer bottlenecks The Decoder · Matthias Bastian
- Building resilient networks for AI supercomputers Microsoft Tech Community
Discussion
-
@openai
@openai
on x
MRC is already deployed across all of OpenAI's largest supercomputers that we use to train frontier models, including our site with @Oracle Cloud Infrastructure (OCI) in Abilene, Texas, and in @Microsoft's Fairwater supercomputers. MRC is now available through the [video]
-
@openai
@openai
on x
We've partnered with @AMD, @Broadcom, @Intel, @Microsoft, and @NVIDIA, to release Multipath Reliable Connection (MRC), a new open networking protocol that helps large AI training clusters run faster and more reliably, with less wasted GPU time. https://openai.com/...
-
@sk7037
Sachin Katti
on x
Today we shared MRC ( https://openai.com/...), a networking protocol developed with @Microsoft, @nvidia, @AMD, @Broadcom, and @intel to improve how large AI training systems move data and recover from failures. This innovation has come full circle for me personally, it was
-
@amd
@amd
on x
At AI scale, raw bandwidth breaks down. What matters is resilience, recovery, and consistency under load. AMD in collaboration with Microsoft and @OpenAI defines a proven solution approach to AI networking with MRC. Learn more now: https://www.amd.com/... [image]
-
@nvidiadc
@nvidiadc
on x
Gigascale AI needs networking built for scale, resilience and openness. NVIDIA Spectrum-X Ethernet now supports MRC, a new RDMA-based protocol that improves throughput, availability and failure recovery for large-scale AI training. Used by @OpenAI, @Microsoft, and @Oracle. Now [i…
-
Yang Zhou
Yang Zhou
on linkedin
Just saw OpenAI release their Multipath Reliable Connection (MRC) protocol: https://lnkd.in/... My feeling is that their techniques are basically …
-
Sachin Katti
Sachin Katti
on linkedin
Today we shared MRC (https://lnkd.in/... a networking protocol developed with Microsoft, NVIDIA, AMD, Broadcom, and Intel to improve how large AI training systems move data and recover from failures. …
-
Nat Rubio-Licht
Nat Rubio-Licht
on linkedin
Exclusive from me this morning: OpenAI and some of the industry's biggest names have come together to fix two of the biggest issues in networking: Congestion and failure. …