Impact of Changes in the Mini-batch Size on CNN Training Epoch Time Impact of Changes in the Mini-batch Size on CNN Training Epoch Time
GPU performance, Convolutional Neural Networks
Convolutional Neural Networks (CNN) drive successful machine learning applications in a growing number of areas. However, training a CNN may take a massive amount of time and expensive high-end GPU resources. CNN training time can change extremely depending on the GPU type and training parameters. In this work, we focus on one training parameter that has a particularly high impact on training time — mini-batch size — to clarify how and why changes in the mini-batch size affect CNN training epoch time.
To understand how epoch time changes with the mini-batch size, we conducted an experiment that measures epoch time of a sample CNN — VGG16 with CIFAR100 dataset in Chainer. We observed extremely high variability of epoch times on several GPU types. Moreover, on some GPU types, we observed abrupt changes: even a slight variation of the mini-batch size makes epoch time increase or decrease almost twofold.
To understand why the abrupt changes occur, we investigated the underlying cuDNN library.
cuDNN provides several different algorithms for each convolution operation. Chainer uses cuDNN heuristics to choose which algorithm to use.
We simulated convolutional layers with a benchmark tool and looked at how their time changes with the mini-batch size. We have found that cuDNN heuristics may choose convolution algorithms that differ hugely in execution time for different mini-batch sizes.
Understanding how CNN training time changes with the mini-batch size is essential for making CNN training faster. It can also help in designing a performance model for predicting CNN training time.
Peter Bryzgalov, Toshiyuki Maeda, and Yutaro Shigeto, “Impact of Changes in the Mini-batch Size on CNN Training Epoch Time”, Virtual poster presentation at ISC High Performance 2020 (ISC2020).