How Sony is using distributed learning to create enterprise AI
FYI, this story is more than a year old
Sony announced that by utilising its deep learning development framework "Core Library: Neural Network Libraries" in addition to the AI Bridging Cloud Infrastructure (ABCI), a world-class computing infrastructure for AI processing, it has achieved the fastest deep learning speeds in the world.
Deep learning is a method of machine learning which uses neural networks modelled after the human brain. By harnessing deep learning, image and sound recognition capabilities have seen rapid growth in recent years, even outperforming humans in certain domains.
However, the size of data used in this learning and model parameters used to improve recognition accuracy has been increasing, causing a subsequent rise in calculation times.
In some cases, it has taken weeks or even months to conduct a single learning session. Because AI development requires a continuous process of trial-and-error, shortening this learning time is of the utmost importance.
To this end, distributed learning using multiple GPUs as a means of shortening learning times is emerging as a popular solution.
When increasing the number of GPUs for distributed learning, there are cases where an increase to batch sizes (the amount of data to be processed at one time) halts the learning process and other cases where the learning speed actually decreases due to the processing delays caused by data transmission times between GPUs.
By utilising technology that can determine the optimal batch sizes and the appropriate number of GPUs based on the current state of the learning process, Sony makes it possible to carry out learning even in large-scale GPU environments such as ABCI, and increased transmission speeds between GPUs through data synchronisation technology optimised for ABCI's system structure.
These technologies were implemented into the "Neural Network Libraries," and used ABCI computing resources provided by AIST's "ABCI Grand Challenge" to carry out learning.
As a result, it was able to complete ImageNet/ResNet-50*2 (the general industry benchmark used to measure distributed learning speeds for deep learning) in approximately 3.7 minutes (when using as many as 2,176 GPUs), achieving the world's fastest speeds to date.
The results of this experiment demonstrate that learning/execution carried out using Neural Network Libraries can achieve world-class speeds and that by utilising the same framework, it is possible to conduct technology development using deep learning with a shorter trial-and-error period.
Moving forward, Sony will continue development on related technologies and seek to contribute to the development of society using AI technology.