Generative AI is rising in popularity due to a confluence of growth in IT infrastructure, including storage.
Generative AI relies on deep learning, compute and GPU, all of which have matured in the last ten years. It also needs high IOPS storage to provide fast access to large datasets tech vendors have been refining for decades as IT has continued to evolve. Storage tools such as object storage, which can scale for large datasets, and distributed parallel files systems, which provide high-performance, low-latency data processing, have been the backbone of cloud computing and the big data movement.
Now storage is becoming an underlying foundation for AI. Some AI models are small enough to execute in memory, putting more of a spotlight on compute, according to Mike Matchett, an analyst at Small World Big Data, an IT analyst firm. But large language models (LLMs) like ChatGPT require, in some cases, billions of nodes, which is too cost prohibitive to be kept in memory.
“You’re not holding [billions of] nodes in memory. The storage becomes a lot more important,” Matchett said.
Despite its speed, memory such as RAM is more expensive than storage, according to Steve McDowell, an analyst and founding partner at NAND Research.
“You’re always going to be limited by the cost of RAM, and it’s always going to be a balance [with storage],” McDowell said.
He said LLMs would need a parallel file system, such as Weka or Panasas, sitting on top of a high-performance scalable storage system, such as Dell’s PowerMax, Vast Data’s Universal Storage and Pure Storage’s FlashBlade.
Storage’s role in generative AI
Generative AI can only produce a good outcome after being trained on reams of data, according to Khalid Eidoo, co-founder and CTO of Crater Labs, an AI and machine learning company based in Toronto that works with businesses to solve specific problems using AI. One method Crater employs is a type of generative AI called generative adversarial networks (GANs), which it used to identify potential structural defects in welds when constructing a nuclear power plant.
In this case, the GAN, which uses four different neural networks, produces images that then get reconciled. Out of the hundreds of thousands of images generated, only five or six meet the high quality level needed, Eidoo said.
To support this functionality, Crater needed high-throughput storage that could read and write synchronously and chose Pure Storage’s FlashBlade product. “When dealing with generative networks, you’re simultaneously reading millions of images to write millions of images,” Eidoo said.
GPUs play an important role in generative AI by accelerating the training of models. But when working with millions of images, the GPU buffer quickly fills up and images need to be written quickly to storage, Eidoo said. High-throughput storage can reduce the potential for a data bottleneck.
Flash not necessary, but optimal
High IOPS storage can provide a user experience more like high-performance computing, according to Matchett.
“You can do parallel file systems on a large number of spinning disks in aggregate,” Matchett said.
A parallel file system feeds data from the LLMs to the GPUs, like DDN’s A3I that combines DDN’s Exascalar, parallel file system with NVIDIA’s DGX, Matchett said.
A hybrid version of Exascalar could be used for generative AI, but it caches and tiers storage, potentially affecting performance, McDowell said. The GPUs can’t sit idle, so the aggregated HDD performance will be cached to SSDs that operate faster than memory.
“[Those] that are serious about large language models, they’re buying high-end flash storage,” McDowell said.
Flash provides high IOPS in denser footprints and can also provide LLMs with aggregated performance, Eidoo said. It’s possible to use millions of HDDs, but footprint matters. Flash storage is denser, higher performing and uses less power than HDDs. Technology that reduces power consumption now will benefit generative AI in the future.
“GPUs use power like there’s no tomorrow,” Eidoo said.
Cloud vs. on premises
LLMs also need space to train models. Whether that is on premises, in the public cloud or a hybrid of the two depends on the size of the model and the performance and control needed, Matchett said.
If generative AI is used for research, storing LLMs on the cloud is ideal because users can get the scale required without investing in the capex infrastructure. However, Matchett predicts vendors will offer generative AI applications that will become core to their business platforms. For those that are dependent on performance and security, on-premises storage will be key.
“As an enterprise operation, you’ve got production workloads that are running at some level of continuity, and that can get expensive,” Matchett said.
Before choosing Pure Storage, Crater Labs worked with AWS and Google Cloud before moving to a hybrid infrastructure for speed, security and costs. Crater considered NetApp and HPE before choosing Pure.
Now, Crater Labs uses a combination of on premises — FlashBlade and FlashBlade’s built-in connection to an S3 object store bucket, according to Eidoo. Crater generates terabytes of data per week, which is inefficient to store solely on premises. Using the S3 object store lets Crater access images on the cloud for modeling.
“We knew very quickly as we started developing these generative models that the performance we were getting in the cloud wasn’t adequate,” Eidoo said.
Adam Armstrong is a TechTarget Editorial news writer covering file and block storage hardware and private clouds. He previously worked at StorageReview.com.