Cerebras Systems extends support for PyTorch and provides the ability to train giant models

SUNNYVALE, Calif.–(BUSINESS WIRE)–Brain Systemsthe pioneer of high-performance artificial intelligence (AI), today published version 1.2 of the Cerebras software platform, CSoft, with extended support for TorchPy and Tensor Flow. Additionally, customers can now quickly and easily train models with billions of parameters through Cerebras weight flow Technology.

PyTorch is the leading machine learning framework. It is used by developers to accelerate the move from research prototyping to production deployment. As model sizes increase and transformer models become more popular, it is critical that machine learning practitioners have access to computational solutions that are fast, easy to set up, and easy to use like the Cerebra CS-2. With the CS-2 running CSoft, the developer community has a powerful tool to enable new breakthroughs in AI.

“From the start, our goal was to seamlessly support whatever machine learning framework our customers wanted to write in,” said Emad Barsoum, senior director, AI Framework, Cerebras Systems. “Our customers write in TensorFlow and PyTorch, and our software stack, CSoft, makes it quick and easy to express your models in any framework you choose. to the 40 gigabytes of on-chip memory of the Cerebras CS-2.”

The Cerebras CS-2 is the fastest AI system in the world. It is powered by the greatest processor ever built – the Cerebras Wafer-Scale Engine 2 (WSE-2). The Cerebras WSE-2 offers more AI-optimized compute cores, faster memory, and more fabric bandwidth than any other deep learning processor in existence. Designed specifically for AI work, the CS-2 runs CSoft which allows machine learning practitioners to write their models in the open source frameworks of TensorFlow or PyTorch and, without modification, run the model on the Cerebras CS-2. In fact, a model that was written for a graphics processing unit or a central processing unit can run under CSoft on the Cerebras CS-2 without any changes. With CS-2 and CSoft, practitioners can scale from small models like BERT to larger existing models like GPT-3.

Large models have demonstrated state-of-the-art accuracy on many language processing and comprehension tasks. Training these large models using GPUs is difficult and time consuming. Training from scratch on new datasets often takes weeks and tens of megawatts of power on large clusters of legacy equipment. Additionally, as the cluster size increases, the power, cost, and complexity increase exponentially. Programming GPU clusters requires rare skills, different machine learning frameworks, and specialized tools that require weeks of engineering time with each iteration.

The CS-2 was built to directly address these challenges. Setting up even the largest model takes minutes, and the CS-2 is faster than clusters of hundreds of graphics processing units. With less time spent on installation, configuration and training, the CS-2 allows users to explore more ideas in less time.

With customers in North America, Asia, Europe, and the Middle East, Cerebras provides cutting-edge AI solutions to a growing list of customers in the enterprise, government, and high-performance computing segments, including GlaxoSmithKline, Astra Zeneca, TotalEnergies, nference, Argonne National Laboratory, Lawrence Livermore National LaboratoryPittsburgh Supercomputing Center, Edinburgh Parallel Computing Center (EPCC) and Tokyo Electron Devices.

For more information on the Cerebras software platform, please visit https://cerebras.net/software/.

About Cerebras Systems

Brain Systems is a team of pioneering computer architects, computer scientists, deep learning researchers, and engineers of all types. We have come together to build a new class of computing systems, designed with the sole purpose of accelerating AI and changing the future of AI work forever. Our flagship product, the CS-2 system, is powered by the world’s largest processor – the 850,000-core Cerebras WSE-2, enabling customers to accelerate their deep learning work by orders of magnitude over graphics processing units.