AMD & Oracle boost AI capabilities with new accelerator launch

Fri, 27th Sep 2024

AMD has announced that Oracle Cloud Infrastructure (OCI) is implementing AMD Instinct MI300X accelerators with ROCm open software for its latest OCI Compute Supercluster instance, BM.GPU.MI300X.8. This new instance is designed to handle advanced AI models with hundreds of billions of parameters. The OCI Supercluster can support up to 16,384 GPUs in a single network fabric, utilising the same ultrafast technology used in other OCI accelerators.

These new OCI bare metal instances cater to demanding AI workloads, including large language model (LLM) inference and training, requiring high throughput, memory capacity, and bandwidth. Already, companies such as Fireworks AI have started using these instances to enhance their AI inference and training capabilities.

Andrew Dieckmann, corporate vice president and general manager of the Data Center GPU Business at AMD, commented on the growing adoption of AMD Instinct MI300X and ROCm open software. "AMD Instinct MI300X and ROCm open software continue to gain momentum as trusted solutions for powering the most critical OCI AI workloads," said Dieckmann. "As these solutions expand further into growing AI-intensive markets, the combination will benefit OCI customers with high performance, efficiency, and greater system design flexibility."

OCI's Senior Vice President of Software Development, Donald Lu, also highlighted the significance of the new offering. "The inference capabilities of AMD Instinct MI300X accelerators add to OCI's extensive selection of high-performance bare metal instances to remove the overhead of virtualised compute commonly used for AI infrastructure," Lu stated. "We are excited to offer more choice for customers seeking to accelerate AI workloads at a competitive price point."

The AMD Instinct MI300X has undergone thorough testing and validation by OCI, demonstrating its effectiveness in AI inferencing and training, especially for latency-sensitive applications and larger batch sizes. This performance has attracted the attention of AI model developers, confirming the accelerator's ability to handle some of the industry's most demanding workloads.

Fireworks AI, a company that offers a platform for building and deploying generative AI, is utilising the capabilities of OCI instances powered by AMD Instinct MI300X. Lin Qiao, CEO of Fireworks AI, discussed the benefits seen from this technology. "Fireworks AI helps enterprises build and deploy compound AI systems across a wide range of industries and use cases," Qiao said. "The amount of memory capacity available on the AMD Instinct MI300X and ROCm open software allows us to scale services to our customers as models continue to grow."

The implementation of AMD Instinct MI300X accelerators in OCI's Supercluster marks a significant development in the infrastructure available for AI training and inference. It offers specialised solutions designed to meet the high demands of today's AI applications, providing OCI customers with improved performance and cost-effectiveness.

Share on: