Abstract
Classifying traffic signs is an indispensable part of Advanced Driver Assistant Systems. This strictly requires that the traffic sign classification model accurately classifies the images and consumes as few CPU cycles as possible to immediately release the CPU for other tasks. In this paper, we first propose a new ConvNet architecture. Then, we propose a new method for creating an optimal ensemble of ConvNets with highest possible accuracy and lowest number of ConvNets. Our experiments show that the ensemble of our proposed ConvNets (the ensemble is also constructed using our method) reduces the number of arithmetic operations 88 and \(73\,\%\) compared with two state-of-art ensemble of ConvNets. In addition, our ensemble is \(0.1\,\%\) more accurate than one of the state-of-art ensembles and it is only \(0.04\,\%\) less accurate than the other state-of-art ensemble when tested on the same dataset. Moreover, ensemble of our compact ConvNets reduces the number of the multiplications 95 and \(88\,\%\), yet, the classification accuracy drops only 0.2 and \(0.4\,\%\) compared with these two ensembles. Besides, we also evaluate the cross-dataset performance of our ConvNet and analyze its transferability power in different layers. We show that our network is easily scalable to new datasets with much more number of traffic sign classes and it only needs to fine-tune the weights starting from the last convolution layer. We also assess our ConvNet through different visualization techniques. Besides, we propose a new method for finding the minimum additive noise which causes the network to incorrectly classify the image by minimum difference compared with the highest score in the loss vector.















Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
The ConvNet architecture and its trained models are available at https://github.com/pcnn/traffic-sign-recognition.
The percent of the samples which are always within the top 2 classification scores.
We calculated the number of the multiplications of a ConvNet taking into account the number of the multiplications for convolving the filters of each layer with the N-channel input from the previous layer, number of the multiplications required for computing the activations of each layer and the number of the multiplications imposed by normalization layers. We showed in Sect. 3 that tanh function utilized in Ciresan et al. (2012) can be efficiently computed using 10 multiplications. ReLU activation used in Jin et al. (2014) does not need any multiplications and Leaky ReLU units in our ConvNet compute the results using only 1 multiplication. Finally, considering that pow(float, float) function needs only 1 multiplication and 64 shift operations (http://tinyurl.com/yehg932), the normalization layer in Jin et al. (2014) requires \(k\times k+3\) multiplications per each element in the feature map.
References
Aghdam, H. H., Heravi, E. J., & Puig, D. (2015). A unified framework for coarse-to-fine recognition of traffic signs using Bayesian network and visual attributes. In: 10th international conference on computer vision theory and applications (VISAPP) (pp. 87–96). doi:10.5220/0005303500870096
Baró, X., Escalera, S., Vitrià, J., Pujol, O., & Radeva, P. (2009). Traffic sign recognition using evolutionary adaboost detection and forest-ECOC classification. IEEE Transactions on Intelligent Transportation Systems, 10(1), 113–126. doi:10.1109/TITS.2008.2011702.
Ciresan, D., Meier, U., Schmidhuber, J. (2012). Multi-column deep neural networks for image classification. In 2012 IEEE conference on computer vision and pattern recognition (pp. 3642–3649). IEEE. doi:10.1109/CVPR.2012.6248110, arXiv:1202.2745v1
Coates, A., & Ng, A. Y. (2012). Learning feature representations with K-means. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 7700 LECTU:561–580, doi:10.1007/978-3-642-35289-8-30
Donahue, J., Jia, Y., Vinyals, O., Hoffman, J., Zhang, N., Tzeng, E., Darrell, T. (2014). DeCAF: A deep convolutional activation feature for generic visual recognition. In: International conference on machine learning (pp. 647–655) arXiv:1310.1531.
Dosovitskiy, A., & Brox, T. (2015). Inverting convolutional networks with convolutional networks (pp. 1–15). arXiv preprint arXiv:1506.02753
Fleyeh, H., & Davami, E. (2011). Eigen-based traffic sign recognition. IET Intelligent Transport Systems, 5(3), 190. doi:10.1049/iet-its.2010.0159.
Gao, X. W., Podladchikova, L., Shaposhnikov, D., Hong, K., & Shevtsova, N. (2006). Recognition of traffic signs based on their colour and shape features extracted using human vision models. Journal of Visual Communication and Image Representation, 17(4), 675–685. doi:10.1016/j.jvcir.2005.10.003.
Girshick, R., Donahue, J., Darrell, T., Berkeley, U. C., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. doi:10.1109/CVPR.2014.81, arXiv:1311.2524.
Greenhalgh, J., & Mirmehdi, M. (2012). Real-time detection and recognition of road traffic signs. IEEE Transactions on Intelligent Transportation Systems, 13(4), 1498–1506. doi:10.1109/tits.2012.2208909.
He, K., Zhang, X., Ren, S., & Sun, J. (2015). Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. arXiv preprint arXiv:1502.01852
Hinton, G. (2014). Dropout : A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research (JMLR), 15, 1929–1958.
Hsu, S. H., & Huang, C. L. (2001). Road sign detection and recognition using matching pursuit method. Image and Vision Computing, 19(3), 119–129. doi:10.1016/S0262-8856(00)00050-0.
Huang, G., Mao, K. Z., Siew, C., Huang, D. (2013). A hierarchical method for traffic sign classification with support vector machines. In: The 2013 international joint conference on neural networks (IJCNN) pp 1–6. IEEE. doi:10.1109/IJCNN.2013.6706803
Jin, J., Fu, K., & Zhang, C. (2014). Traffic sign recognition with hinge loss trained convolutional neural networks. IEEE Transactions on Intelligent Transportation Systems, 15(5), 1991–2000. doi:10.1109/TITS.2014.2308281.
Krizhevsky, A., Sutskever, I., Hinton, G. (2012). Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp. 1097–1105. Curran Associates, Inc.
Larsson, F., & Felsberg, M. (2011). Using Fourier descriptors and spatial models for traffic sign recognition. In: Image analysis lecture notes in computer science (Vol 6688, pp. 238–249). Springer. doi:10.1007/978-3-642-21227-7_23
Liu, H., Liu, Y., & Sun, F. (2014). Traffic sign recognition using group sparse coding. Information Sciences, 266, 75–89. doi:10.1016/j.ins.2014.01.010.
Lu, K., Ding, Z., & Ge, S. (2012). Sparse-representation-based graph embedding for traffic sign recognition. IEEE Transactions on Intelligent Transportation Systems, 13(4), 1515–1524. doi:10.1109/TITS.2012.2220965.
Maas, A. L., Hannun, A. Y., & Ng, A. Y. (2013). Rectifier nonlinearities improve neural network acoustic models. In International conference on machine learning (ICML) workshop on deep learning (Vol 30)
Mahendran, A., & Vedaldi, A. (2015). Understanding deep image representations by inverting them. In Computer vision and pattern recognition (pp. 5188–5196). IEEE, Boston. doi:10.1109/CVPR.2015.7299155, arXiv:1412.0035
Maldonado-Bascon, S., Lafuente-Arroyo, S., Gil-Jimenez, P., Gomez-Moreno, H., & Lopez-Ferreras, F. (2007). Road-sign detection and recognition based on support vector machines. IEEE Transactions on Intelligent Transportation Systems, 8(2), 264–278. doi:10.1109/TITS.2007.895311.
Maldonado Bascón, S., Acevedo Rodríguez, J., Lafuente Arroyo, S., Fernndez Caballero, A., & López-Ferreras, F. (2010). An optimization on pictogram identification for the road-sign recognition task using SVMs. Computer Vision and Image Understanding, 114(3), 373–383. doi:10.1016/j.cviu.2009.12.002.
Mathias, M., Timofte, R., Benenson, R., & Van Gool, L. (2013). Traffic sign recognition–How far are we from the solution? International joint conference on neural networks,. doi:10.1109/IJCNN.2013.6707049.
Møgelmose, A., Trivedi, M. M., & Moeslund, T. B. (2012). Vision-based traffic sign detection and analysis for intelligent driver assistance systems: Perspectives and survey. IEEE Transactions on Intelligent Transportation Systems, 13(4), 1484–1497. doi:10.1109/TITS.2012.2209421.
Moiseev, B., Konev, A., Chigorin, A., & Konushin, A. (2013). Evaluation of traffic sign recognition methods trained on synthetically generated data. In: 15th international conference on advanced concepts for intelligent vision systems (ACIVS), Springer, Pozna?, pp 576–583, doi:10.1007/978-3-319-02895-8_52
Paclík, P., Novovičová, J., Pudil, P., & Somol, P. (2000). Road sign classification using Laplace kernel classifier. Pattern Recognition Letters, 21(13–14), 1165–1173. doi:10.1016/S0167-8655(00)00078-7.
Piccioli, G., De Micheli, E., Parodi, P., & Campani, M. (1996). Robust method for road sign detection and recognition. Image and Vision Computing, 14(3), 209–223. doi:10.1016/0262-8856(95)01057-2.
Ruta, A., Li, Y., & Liu, X. (2010). Robust class similarity measure for traffic sign recognition. IEEE Transactions on Intelligent Transportation Systems, 11(4), 846–855. doi:10.1109/TITS.2010.2051427.
Sermanet, P., & Lecun, Y. (2011). Traffic sign recognition with multi-scale convolutional networks. In Proceedings of the international joint conference on neural networks (pp. 2809–2813). doi:10.1109/IJCNN.2011.6033589
Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., LeCun, Y. (2013). OverFeat : Integrated recognition , localization and detection using convolutional networks. In arXiv preprint arXiv:1312.6229, pp. 1–15
Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. In International conference on learning representation (ICLR) (pp. 1–13), 1409.1556v5.
Simonyan, K., Vedaldi, A., & Zisserman, A. (2013). Deep inside convolutional networks: visualising image classification models and saliency maps. arXiv. preprint, 13126034, 1–8.
Stallkamp, J., Schlipsing, M., Salmen, J., & Igel, C. (2012). Man vs. computer: Benchmarking machine learning algorithms for traffic sign recognition. Neural Networks, 32, 323–332. doi:10.1016/j.neunet.2012.02.016.
Sun, Z. L., Wang, H., Lau, W. S., Seet, G., & Wang, D. (2014). Application of BW-ELM model on traffic sign recognition. Neurocomputing, 128, 153–159. doi:10.1016/j.neucom.2012.11.057.
Szegedy, C., Reed, S., Sermanet, P., Vanhoucke, V., & Rabinovich, A. (2014a). Going deeper with convolutions. In: arXiv preprint arXiv:1409.4842, pp. 1–12.
Szegedy, C., Zaremba, W., Sutskever, I. (2014b). Intriguing properties of neural networks. arXiv:1312.6199v4
Tibshirani, R. (1994). Regression Selection and Shrinkage via the Lasso.,. doi:10.2307/2346178.
Timofte R, Van Gool, L. (2011). Sparse representation based projections. In: 22nd British Machine Vision Conference (pp. 61.1–61.12). BMVA Press. doi:10.5244/C.25.61
Timofte, R., Zimmermann, K., & Van Gool, L. (2011). Multi-view traffic sign detection, recognition, and 3D localisation. Machine Vision and Applications (November):1–15. doi:10.1007/s00138-011-0391-3.
Van Der Maaten, L., & Hinton, G. (2008). Visualizing data using t-SNE. Journal of Machine Learning Research, 9, 2579–2605. doi:10.1007/s10479-011-0841-3.
Wang, J., Yang, J., Yu, K., Lv, F., Huang, T., & Gong, Y. (2010). Locality-constrained linear coding for image classification. In IEEE computer vision and pattern recognition (CVPR) (pp. 3360–3367). doi:10.1109/CVPR.2010.5540018.
Yosinski, J., Clune, J., Bengio, Y., & Lipson, H. (2014). How transferable are features in deep neural networks ? Neural Information Processing System (NIPS), 27. arXiv:1411.1792v1.
Yuan, X., Hao, X., Chen, H., & Wei, X. (2014). Robust traffic sign recognition based on color global and local oriented edge magnitude patterns. IEEE Transactions on Intelligent Transportation Systems, 15(4), 1466–1474. doi:10.1109/TITS.2014.2298912.
Zaklouta, F., & Stanciulescu, B. (2012). Real-time traffic-sign recognition using tree classifiers. IEEE Transactions on Intelligent Transportation Systems, 13(4), 1507–1514. doi:10.1109/TITS.2012.2225618.
Zaklouta, F., & Stanciulescu, B. (2014). Real-time traffic sign recognition in three stages. Robotics and Autonomous Systems, 62(1), 16–24. doi:10.1016/j.robot.2012.07.019.
Zaklouta, F., Stanciulescu, B., & Hamdoun, O. (2011). Traffic sign classification using K-d trees and random forests. In Proceedings of the international joint conference on neural networks (pp. 2151–2155). doi:10.1109/IJCNN.2011.6033494.
Zeiler, M., & Fergus, R. (2014). Visualizing and understanding convolutional networks. European Conference on Computer Vision (ECCV), 8689, 818–833. doi:10.1007/978-3-319-10590-1_53.1311.2901.
Zeng, Y., Xu, X., Fang, Y., & Zhao, K. (2015). Traffic sign recognition using deep convolutional networks and extreme learning machine. In Intelligence science and big data engineering. image and video data engineering (IScIDE ) (pp. 272–280). Springer. doi:10.1007/978-3-319-23989-7_28.
Acknowledgments
The authors are grateful for the support granted by Generalitat de Catalunya’s Agècia de Gestió d’Ajuts Universitaris i de Recerca (AGAUR) through FI-DGR 2015 and Martí Franquès 2015 fellowships.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Hiroshi Ishikawa, Takeshi Masuda, Yasuyo Kita and Katsushi Ikeuchi.
Rights and permissions
About this article
Cite this article
Aghdam, H.H., Heravi, E.J. & Puig, D. A Practical and Highly Optimized Convolutional Neural Network for Classifying Traffic Signs in Real-Time. Int J Comput Vis 122, 246–269 (2017). https://doi.org/10.1007/s11263-016-0955-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-016-0955-9