4 ways AWS is engineering infrastructure to power generative AI

Prasad Kalyanaraman, VP of Infrastructure Services at AWS, writes that AWS continues to optimize its infrastructure to support generative AI at scale, from networking innovations to changes in data center design.

Generative artificial intelligence (AI) has transformed our world seemingly overnight, as individuals and enterprises use the new technology to enhance decision-making, transform customer experiences, and boost creativity and innovation. But the underlying infrastructure that powers generative AI wasn’t built in a day—in fact, it is the result of years of innovation.

AI and machine learning (ML) have been a focus for Amazon for more than 25 years, and they drive everyday capabilities like shopping recommendations and packaging decisions. Within Amazon Web Services (AWS), we’ve been focused on bringing that knowledge to our customers by putting ML into the hands of every developer, data scientist, and expert practitioner.

AI is now a multibillion-dollar revenue run rate business for AWS. Over 100,000 customers across industries—including adidas, New York Stock Exchange, Pfizer, Ryanair, and Toyota—are using AWS AI and ML services to reinvent experiences for their customers. Additionally, many of the leading generative AI models are trained and run on AWS.

All of this work is underpinned by AWS’s global infrastructure, including our data centers, global network, and custom AI chips. There is no compression algorithm for experience, and since we’ve been building large-scale data centers for more than 15 years and GPU-based (graphics processing units) servers for more than 12 years, we have a massive existing footprint of AI infrastructure.

As the world changes rapidly, AWS continues to adapt and improve upon our strong infrastructure foundation to deliver new innovations that support generative AI at scale. Here are four ways we’re doing that.

  1. Delivering low-latency, large-scale networking

Generative AI models require massive amounts of data to train and run efficiently. The larger and more complex the model, the longer the training time. As you increase time to train, you’re not only increasing operating costs but also slowing down innovation. Traditional networks are not sufficient for the low latency and large scale needed for generative AI model training.

We’re constantly working to reduce network latency and improve performance for customers. Our approach is unique in that we have built our own network devices and network operating systems for every layer of the stack—from the Network Interface Card, to the top-of-rack switch, to the data center network, to the internet-facing router and our backbone routers. This approach not only gives us greater control over improving security, reliability, and performance for customers, but also enables us to move faster than others to innovate.

  1. Continuously improving the energy efficiency of our data centers

Training and running AI models can be energy-intensive, so efficiency efforts are critical. AWS is committed to running our business in an efficient way to reduce our impact on the environment. Not only is this the right thing to do for communities and for our planet, but it also helps AWS reduce costs, and we can then pass those cost savings on to our customers. For many years, we’ve focused on improving energy efficiency across our infrastructure.

New research by Accenture shows these efforts are paying off. The research estimates that AWS’s infrastructure is up to 4.1 times more efficient than on-premises, and when optimizing on AWS, carbon footprint can be reduced by up to 99%. But we can’t stop there as power demand increases.

AI chips perform mathematical calculations at high speed, making them critical for ML models. They also generate much more heat than other types of chips, so new AI servers that require more than 1,000 watts of power per chip will need to be liquid-cooled. However, some AWS services utilize network and storage infrastructure that does not require liquid cooling, and therefore, cooling this infrastructure with liquid would be an inefficient use of energy.

AWS’s latest data center design seamlessly integrates optimized air-cooling solutions alongside liquid cooling capabilities for the most powerful AI chipsets, like the NVIDIA Grace Blackwell Superchips. This flexible, multimodal cooling design allows us to extract maximum performance and efficiency whether running traditional workloads or AI/ML models. Our team has engineered our data centers—from rack layouts to electrical distribution to cooling techniques—so that we continuously increase energy efficiency, no matter the compute demands.

  1. Security from the ground up

One of the most common infrastructure questions we hear from customers as they explore generative AI is how to protect their highly sensitive data. Security is our top priority, and it’s built into everything we do. Our infrastructure is monitored 24/7, and when data leaves our physical boundaries and travels between our infrastructure locations, it is encrypted at the underlying network layer. Not all clouds are built the same, which is adding to the number of companies moving their AI focus to AWS.

AWS is architected to be the most secure and reliable global cloud infrastructure. Our approach to securing AI infrastructure relies on three key principles: 1) Complete isolation of the AI data from the infrastructure operator, meaning the infrastructure operator must have no ability to access customer content and AI data, such as AI model weights and data processed with models; 2) Ability for customers to isolate AI data from themselves, meaning the data remains inaccessible from customers’ own users and software; and 3) Protected infrastructure communications, meaning the communication between devices in the ML accelerator infrastructure must be protected.

  1. AWS AI Chips

The chips that power generative AI are crucial, impacting how quickly, inexpensively, and sustainably you can train and run models. For many years, AWS has innovated to reduce the costs of our services. This is no different for AI—by helping customers keep costs under control, we can ensure AI is accessible to customers of all sizes and industries. So for the last several years, we’ve been designing our own AI chips, including AWS Trainium and AWS Inferentia.

These purpose-built chips offer superior price performance, and make it more energy-efficient to train and run generative AI models. AWS Trainium is designed to speed up and lower the cost of training ML models by up to 50 percent over other comparable training-optimized Amazon EC2 instances, and AWS Inferentia enables models to generate inferences more quickly and at lower cost, with up to 40% better price performance than other comparable inference-optimized Amazon EC2 instances. Demand for our AI chips is quite high given its favorable price-performance benefits relative to available alternatives.

Trainium2 is our third-generation AI chip and will be available later this year. Trainium2 is designed to deliver up to 4 times faster training than first-generation Trainium chips and will be able to be deployed in EC2 UltraClusters of up to 100,000 chips, making it possible to train foundation models and large language models in a fraction of the time, while improving energy efficiency up to 2 times.

Additionally, AWS works with partners including NVIDIA, Intel, Qualcomm, and AMD to offer the broadest set of accelerators in the cloud for ML and generative AI applications. And we’ll continue to innovate in order to deliver future generations of AWS-designed chips that deliver even better price performance for customers.

Amid the AI boom, it’s important that organizations choose the right compute infrastructure to lower costs and ensure high performance. We are proud to offer our customers the most secure, performant, cost-effective, and energy-efficient infrastructure for building and scaling ML applications.