Using Benchmarking and Regression Models for Predicting CNN Training Time on a GPU Using Benchmarking and Regression Models for Predicting CNN Training Time on a GPU

Convolutional neural network (CNN) training typically demands substantial time, expensive computational resources, and significant energy. Accurate time predictions are crucial for determining the most suitable hardware and optimal training parameters to minimize training time, expenses, and energy consumption. While NVIDIA GPUs are popular for accelerating CNN training, NVIDIA black-box cuDNN library hinders accurate prediction of convolutional layer execution time. Evaluation experiments reported in the literature have been limited in accuracy. Hence, in this study, we present an approach to predicting the training time of unseen CNN architectures with a primary emphasis on achieving high accuracy. This approach decomposes the training time into two components: the time taken to perform training operations of the convolutional layers and that taken by the non-convolutional layers and other operations. We used benchmarks to estimate the time of convolutional layers accurately. We predicted the rest of the CNN training time using a regression model from the aggregated parameters of non-convolutional layers. In the evaluation experiments, we used 25 diverse CNN architectures to evaluate the prediction accuracy over a wide range of mini-batch sizes using 6 different NVIDIA GPU cards. The experiments demonstrated the robustness of the proposed approach and a high level of accuracy in predicting the training time of unseen CNNs with a mean absolute percentage error (MAPE) as low as 4–6%. Through an in-depth prediction analysis, we show that the prediction errors are comparable to the magnitude of the CNN training time variability. Emphasizing accuracy, our method offers a practical solution for predicting CNN training time that can aid ML practitioners in making informed decisions about hardware and mini-batch size selection, leading to substantial savings in time, cost, and energy.

Peter Bryzgalov and Toshiyuki Maeda. 2024. Using Benchmarking and Regression Models for Predicting CNN Training Time on a GPU. In Proceedings of the 4th Workshop on Performance EngineeRing, Modelling, Analysis, and VisualizatiOn STrategy (PERMAVOST ’24). Association for Computing Machinery, New York, NY, USA, 8–15. https://doi.org/10.1145/3660317.3660323

Members