GigaIOs SuperNODE to Power TensorWave Deployment with AMD MI300X – High-Performance Computing News … – insideHPC

San Jose, California, December 6, 2023 GigaIO, provider of open workload-defined infrastructure for AI and accelerated computing, has announced what the company said is the largest order yet for its SuperNODE utilizing tens of thousands of the AMD Instinct MI300X accelerators.

GigaIOs infrastructure will form the backbone of a bare-metal specialized AI cloud code-named TensorNODE, to be built by cloud provider TensorWave for supplying access to AMD data center GPUs, especially for use in LLMs.

The company said the SuperNODE, launched last June, was the worlds first 32-GPU single-node supercomputer. The TensorNODE deployment will build upon this architecture to a greater scale, leveraging GigaIOs PCIe Gen-5 memory fabric to provide a simpler workload setup and deployment than is possible with legacy networks, and eliminating the associated performance tax., according to GigaIO.

TensorWave is excited to bring this innovative solution to market with GigaIO and AMD, said Darrick Horton, CEO of TensorWave. We selected the GigaIO platform because of its superior capabilities, in addition to GigaIOs alignment with our values and commitment to open standards. Were leveraging this novel infrastructure to support large-scale AI workloads and we are proud to be collaborating with AMD as one of the first cloud providers to deploy the MI300X accelerator solutions.

GPU utilization is key in this era of GPU scarcity but requires significant VRAM and memory bandwidth. TensorWave will use FabreX to create the very first petabyte-scale GPU memory pool without the performance impact of non memory-centric networks. The first installment of TensorNODE is expected to be operational starting in early 2024 with an architecture that will support up to 5,760 GPUs across a single FabreX memory fabric domain. Workloads will be able to access more than a petabyte of VRAM in a single job from any node, enabling even the largest jobs to be completed in record time. Throughout 2024, multiple TensorNODEs will be deployed.

The composable nature of GigaIOs dynamic infrastructure provides TensorWave with tremendous flexibility and agility over standard static infrastructure; as LLMs and AI user needs evolve over time, the infrastructure can be tuned on the fly to meet both current and future needs.

TensorWaves cloud will be greener than alternatives by eliminating redundant servers and associated networking equipment, providing a savings in cost, complexity, space, water, and power.

We are thrilled to power TensorWaves infrastructure at scale by combining the power of the revolutionary AMD Instinct MI300X accelerators with GigaIOs AI infrastructure architecture, including our unique memory fabric, FabreX. This deployment validates the pioneering approach we have taken to reimagining data center infrastructure, said Alan Benjamin, CEO of GigaIO. The TensorWave teambrings both a visionary approach to cloud computing and a deep expertise in standing up and deploying very sophisticated accelerated data centers.

TensorNODE is an all-AMD solution featuring both 4th Gen AMD CPUs and MI300X accelerators. The expected performance of the TensorNODE is made possible by the MI300X, which delivers 192GB of HBM3 memory per accelerator. The leadership memory capacity of these accelerators, combined with GigaIOs memory fabric which allows for near-perfect scaling with no compromse to performance solves the challenge of underutilized or idle GPU cores.

We are excited about our collaboration with GigaIO and TensorWave to bring unique solutions to the evolving workload demands of AI and HPC, said Andrew Dieckmann, Corporate Vice President and General Manager, Data Center and Accelerated Processing of AMD. GigaIOs SuperNODE architecture, powered by AMD Instinct accelerators and AMD EPYC CPUs, is expected to deliver impressive performance and flexibility.

Read this article:
GigaIOs SuperNODE to Power TensorWave Deployment with AMD MI300X - High-Performance Computing News ... - insideHPC

Related Posts

Comments are closed.