Sparse Neural Networks Training

Date:

More information here

Motivated by the success of GPT-3, a trillion parameters model race appears to be taking shape, drawing in more technological giants with significant investment. In concert with the increasingly strong results, the resources required to train and deploy those massive models are prohibitive. While sparse neural networks have been widely used to substantially reduce the computational demands of inference, researchers recently began to investigate techniques to train intrinsically sparse neural networks from scratch to accelerate training (sparse training). As a relatively new avenue, sparse training receives upsurging attention and quickly evolves as a universal approach that has demonstrated strong results in a wide variety of architectures. This tutorial aims to give a comprehensive discussion of sparsity in neural network training. We first revisit the existing approaches to obtain sparse neural networks from the perspective of the accuracy-efficiency trade-off. Then we dig into the performance of sparse neural networks training for different machine learning paradigms, including supervised learning, unsupervised learning, and reinforcement learning. We look to both, single tasks and continual learning. Finally, we point out the current challenges of sparse neural networks training in scale and promising future directions.