Please wait a minute...
Computational Visual Media  2020, Vol. 6 Issue (4): 455-466    doi: 10.1007/s41095-020-0185-5
Research Article     
Weight asynchronous update: Improving the diversity of filters in a deep convolutional network
Dejun Zhang1(),Linchao He2(),Mengting Luo2(),Zhanya Xu1,(✉)(),Fazhi He3()
1 School of Geography and Information Engineering, China University of Geosciences, Wuhan 430074, China
2 College of Information and Engineering, Sichuan Agricultural University, Yaan 625014, China
3 School of Computer, Wuhan University, Wuhan 430072, China
Download: PDF (1107 KB)      HTML  
Export: BibTeX | EndNote (RIS)      

Abstract  

Deep convolutional networks have obtained remarkable achievements on various visual tasks due to their strong ability to learn a variety of features. A well-trained deep convolutional network can be compressed to 20%-40% of its original size by removing filters that make little contribution, as many overlapping features are generated by redundant filters. Model compression can reduce the number of unnecessary filters but does not take advantage of redundant filters since the training phase is not affected. Modern networks with residual, dense connections and inception blocks are considered to be able to mitigate the overlap in convolutional filters, but do not necessarily overcome the issue. To do so, we propose a new training strategy, weight asynchronous update, which helps to significantly increase the diversity of filters and enhance the representation ability of the network. The proposed method can be widely applied to different convolutional networks without changing the network topology. Our experiments show that the stochastic subset of filters updated in different iterations can significantly reduce filter overlap in convolutional networks. Extensive experiments show that our method yields noteworthy improvements in neural network performance.



Key wordsdeep convolutional network      model com-pression      convolutional filter     
Received: 06 April 2020      Published: 30 November 2020
Fund:  National Natural Science Foundation of China(Grant No. 61702350)
Corresponding Authors: Zhanya Xu     E-mail: zhangdejun@cug.edu.cn;fpsandnoob@hotmail.com;sookie0331@ icloud.com;zhanyaxu@163.com;fzhe@whu.edu.cn
About author: Dejun Zhang received his Ph.D. degree from the Department of Computer Science, Wuhan University, China, in 2015. He is currently an associate professor of the School of Geography and Information Engineering, China University of Geosciences. Since 2015, he has served as a senior member of the China Society for Industrial and Applied Mathematics (CSIAM) and is a member of the geometric design & computing committee of CSIAM. Since 2020, he has been serving as a China Computer Federation (CCF) Senior Member. He was a technical program chair for the 5th Asian Conference on Pattern Recognition (ACPR 2019). His research areas include computer vision, computer graphics, image and video processing, and deep learning. He has published more than 20 refereed articles in journals and conference proceedings.|Linchao He is currently a senior student in the College of Information and Engineering, Sichuan Agricultural University (SICAU) in Yaan, China. He is a member of the CCF. His research interests include image classification, object detection, action recognition, and deep learning.|Mengting Luo is currently a senior student in the College of Information and Engineering, Sichuan Agricultural University (SICAU) in Yaan, China. She is a member of the CCF. Her research interests include image classification, object detection, and action recognition.|Zhanya Xu received his Ph.D. degree from China University of Geosciences in 2010. He is currently a lecturer in the School of Geography and Engineering, China University of Geosciences. He is a member of the CCF. His research areas include spatial information services, big data processing, and intelligent computing. He has published more than 20 papers in journals and conferences.|Fazhi He received his Ph.D. degree from Wuhan University of Technology. He was a postdoctoral researcher in the State Key Laboratory of CAD & CG at Zhejiang University, a visiting researcher in Korea Advanced Institute of Science & Technology and a visiting faculty member in the University of North Carolina at Chapel Hill. Now he is a professor in the School of Computing, Wuhan University. He has served as a senior member of CSIAM and a member of the geometric design & computing committee of CSIAM. Currently, he is a member of the editorial board for the Journal of Computer-Aided Design & Computer Graphics. His research interests are computer graphics, computer-aided design, and computer supported cooperative work.
Cite this article:

Dejun Zhang,Linchao He,Mengting Luo,Zhanya Xu,Fazhi He. Weight asynchronous update: Improving the diversity of filters in a deep convolutional network. Computational Visual Media, 2020, 6(4): 455-466.

URL:

http://cvm.tsinghuajournals.com/10.1007/s41095-020-0185-5     OR     http://cvm.tsinghuajournals.com/Y2020/V6/I4/455

Fig. 1 Features learned by sync and async updating.
c, w, and h are channel, width, and height, respectively.
">
Fig. 2 Weight asynchronous update training strategy. c, w, and h are channel, width, and height, respectively.
NetworkDepthTest accuracy (%)
CIFAR-10CIFAR-100
BaselineWAUBaselineWAU
AlexNet [1]77.2678.92 (1.66)45.1747.89 (2.72)
ResNet-50 [13]5092.6893.38 (0.70)70.5871.41 (0.83)
VGG-BN [29]1993.0193.44 (0.43)70.7673.73 (2.96)
PreResNet [30]11093.5894.12 (0.54)72.5373.23 (0.70)
ResNext [12]2995.5696.11 (0.55)80.5482.75 (2.21)
Wide ResNet [31]2895.5496.16 (0.62)80.9181.10 (0.19)
DenseNet [11]4089.9191.54 (1.63)63.2867.41 (4.13)
DenseNet [11]10091.1092.20 (1.10)68.0870.22 (2.14)
Table 1 WAU shows significant performance improvement over the baseline on both CIFAR-10 and CIFAR-100 (see Ref. [28], chap. 3)
NetworkAP@.5AP
VGG-16 w/o WAU46.926.9
VGG-16 with WAU48.1 (1.2)27.4 (0.5)
Table 2 Object detection accuracy (%) for Faster R-CNN [32] on the COCO minival set [33]. All models were trained on the trainval35k set with images of size 600 pixels
NetworkBaselineOurs (WAU)
Faster R-CNN w/o WD70.1070.74 (0.64)
Faster R-CNN with WD69.8070.80 (1.00)
Table 3 Object detection accuracy (%) using Faster R-CNN [32] on the Pascal VOC 2007 test set. Models were trained on the Pascal VOC 2007 trainval set
Fig. 3 Filter correlation using sync and async. (a) Correlation of 32 filters within a single layer. (b) Correlation of 64 filters between two layers inside a residual block. Upper and lower triangles respectively represent the results of sync and async weight updating training methods.
1st phase2nd phase3rd phase
ASA92.7593.29 (0.54)93.40 (0.65)
SAS92.4992.65 (0.16)92.76 (0.27)
Table 4 Different training flows. Both strategies lead to accuracy (%) improvement for ResNet-32 trained on CIFAR-10, but ASA is better
BaselineVarious training schemesWAU
Original [13]DSD [15]BAN [34]RePr [18]AsynchronousASA
8.77.88.27.77.49 (1.21)7.06 (1.64)
Table 5 Comparison of the test error (%) of WAU on CIFAR-10 with other related training strategies
Fig. 4 Test accuracy and convergence speed of our WAU method and a baseline, for various convolutional networks, on CIFAR-10.
Fig. 5 Ablation of affine transform in BN. Performance on an image recognition task is shown for four different models: WAU and Dropout, each with and without BN affine transform.
RateAccuracy
0.166.59 (-10.67)
0.266.74 (-10.52)
0.366.40 (-10.86)
0.479.29 (+2.03)
0.578.92 (+1.66)
0.678.12 (+0.86)
0.778.82 (+1.56)
0.878.15 (+0.89)
0.978.95 (+1.69)
1.0 (baseline)77.26
Table 6 Comparison of the classification accuracy (%) of WAU with different async rate
r. High enough async rate always results in improved performance, and the method is not sensitive to the hyperparameter.
">
Fig. 6 Influence of hyper-parameter r. High enough async rate always results in improved performance, and the method is not sensitive to the hyperparameter.
[1]   Krizhevsky, A.; Sutskever, I.; Hinton, G. E. ImageNet classification with deep convolutional neural networks. In: Proceedings of the 25th International Conference on Neural Information Processing Systems, Vol. 1, 1097-1105, 2012.
[2]   Zhang, D. J.; He, L. C.; Tu, Z. G.; Zhang, S. F.; Han, F.; Yang, B. X. Learning motion representation for real-time spatio-temporal action localization. Pattern Recognition Vol. 103, 107312, 2020.
[3]   Li, J.; Xu, K.; Chaudhuri, S.; Yumer, E.; Zhang, H.; Guibas, L. Grass: Generative recursive autoencoders for shape structures. ACM Transactions on Graphics Vol. 36, No. 4, 1-14, 2017.
[4]   Li, J.; Chen, B. M.; Lee, G. H. SO-Net: Self-organizing network for point cloud analysis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 9397-9406, 2018.
[5]   Zhang, D. J.; He, F. Z.; Tu, Z. G.; Zou, L.; Chen, Y. L. Pointwise geometric and semantic learning network on 3D point clouds. Integrated Computer-Aided Engineering Vol. 27, No. 1, 57-75, 2019.
[6]   Rabiner, L. R. A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE Vol. 77, No. 2, 257-286, 1989.
[7]   Hinton, G.; Deng, L.; Yu, D.; Dahl, G.; Mohamed, A. R.; Jaitly, N.; Senior, A.; Vanhoucke, V.; Nguyen. P.; Sainath, T. et al. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine Vol. 29, No. 6, 82-97, 2012.
[8]   Zhang, D. J.; Hong, M. B.; Zou, L.; Han, F.; He, F. Z.; Tu, Z. G.; Ren, Y. F. Attention pooling-based bidirectional gated recurrent units model for sentimental classification. International Journal of Computational Intelligence Systems Vol. 12, No. 2, 723-732, 2019.
[9]   Zhang, D. J.; Luo, M. T.; He, F. Z. Reconstructed similarity for faster GANs-based word translation to mitigate hubness. Neurocomputing Vol. 362, 83-93, 2019.
[10]   Pan, Y. T.; He, F. Z.; Yu, H. P. A novel enhanced collaborative autoencoder with knowledge distillation for top-N recommender systems. Neurocomputing Vol. 332, 137-148, 2019.
[11]   Huang, G.; Liu, Z.; Maaten, L. V. D.; Weinberger, K. Q. Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4700-4708, 2017.
[12]   Xie, S.; Girshick, R.; Doll, P.; Tu, Z.; He, K. Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1492-1500, 2017.
[13]   He, K. M.; Zhang, X. Y.; Ren, S. Q.; Sun, J. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 770-778, 2016.
[14]   Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich. A. Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1-9, 2015.
[15]   Han, S.; Pool, J.; Narang, S. R.; Mao, H. Z.; Gong, E. H.; Tang, S. J.; Elsen, E.; Vajda, P.; Paluri, M.; Tran, J. et al. DSD: Dense-sparse-dense training for deep neural networks. arXiv preprint arXiv:1607.04381, 2016.
[16]   Hu, H. Y.; Peng, R.; Tai, Y. W.; Tang, C. K. Network trimming: A data-driven neuron pruning approach towards efficient deep architectures. arXiv preprint arXiv:1607.03250, 2016.
[17]   Molchanov, P.; Tyree, S.; Karras, T.; Aila, T.; Kautz, J. Pruning convolutional neural networks for resource efficient inference. arXiv preprint arXiv:1611.06440, 2016.
[18]   Prakash, A.; Storer, J.; Florencio, D.; Zhang, C. Repr: Improved training of convolutional filters. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 10666-10675, 2019.
[19]   Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research Vol. 15, No. 56, 1929-1958, 2014.
[20]   Li, X.; Chen, S.; Hu, X.; Yang. J. Understanding the disharmony between dropout and batch normalization by variance shift. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2682-2690, 2019.
[21]   Gastaldi, X. Shake-shake regularization of 3-branch residual networks. In: Proceedings of the 5th International Conference on Learning Representations, 2017.
[22]   Yamada, Y.; Iwamura, M.; Akiba, T.; Kise, K. Shakedrop regularization for deep residual learning. IEEE Access Vol. 7, 186126-186136, 2019.
[23]   Li, Y. X.; Yosinski, J.; Clune, J.; Lipson, H.; Hopcroft, J. Convergent Learning: Do different neural networks learn the same representations? arXiv preprint arXiv:1511.07543, 2015.
[24]   Latham, P. E. Associative memory in realistic neuronal networks. In: Proceedings of the 14th International Conference on Neural Information Processing Systems: Natural and Synthetic, 237-244, 2001.
[25]   Norton, J. D. Science and certainty. Synthese Vol. 99, No. 1, 3-22, 1994.
[26]   Duchi, J. C.; Hazan, E.; Singer, Y. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research Vol. 12, 2121-2159, 2011.
[27]   Kingma, D. P.; Ba, J. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
[28]   Krizhevsky, A. Learning multiple layers of features from tiny images. Master Thesis. University of Tront, 2009.
[29]   Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
[30]   He, K. M.; Zhang, X. Y.; Ren, S. Q.; Sun, J. Identity mappings in deep residual networks. In: Computer Vision - ECCV 2016. Lecture Notes in Computer Science, Vol. 9908. Leibe, B.; Matas, J.; Sebe, N.; Welling, M. Eds. Springer Cham, 630-645, 2016.
[31]   Zagoruyko, S.; Komodakis, N. Wide residual networks. arXiv preprint arXiv:1605.07146, 2016.
[32]   Ren, S. Q.; He, K. M.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 39, No. 6, 1137-1149, 2017.
[33]   Lin, T. Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C. L. Microsoft COCO: Common objects in context. In: Computer Vision - ECCV 2014. Lecture Notes in Computer Science, Vol. 8693. Fleet, D.; Pajdla, T.; Schiele, B.; Tuytelaars, T. Eds. Springer Cham, 740-755, 2014.
[34]   Furlanello, T.; Lipton, Z.; Tschannen, M.; Itti, L.; Anandkumar, A. Born-again neural networks. In: Proceedings of the 35th International Conference on Machine Learning, 1602-1611, 2018.
[35]   Ioffe, S.; Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Proceedings of the International Conference on Machine Learning, 448-456, 2015.
No related articles found!