India, May 8 -- The AI infrastructure has become more difficult to develop. Adding more graphics cards is not enough anymore. The wired network that connects GPUs must keep up with the increased load of transferring data.
This is the idea behind Multipath Reliable Connection (MRC), a new AI networking layer developed and donated to the Open Compute Project by AMD, OpenAI, Microsoft, NVIDIA, Broadcom, and Intel. This is significant because modern AI training infrastructures typically consist of thousands of accelerators exchanging data at all times. A single slow transfer could affect an entire batch of training steps.
For many years, people have viewed AI infrastructure solely in terms of compute capability: more GPUs and faster memory,...
Click here to read full article from source
इस लेख के रीप्रिंट को खरीदने या इस प्रकाशन का पूरा फ़ीड प्राप्त करने के लिए, कृपया
हमे संपर्क करें.