Abstract
Tracking the head in a video stream is a common thread seen within computer vision literature, supplying the research community with a large number of challenging and interesting problems. Head pose estimation from monocular cameras is often considered an extended application after the face tracking task has already been performed. This often involves passing the resultant 2D data through a simpler algorithm that best fits the data to a static 3D model to determine the 3D pose estimate. This work describes the 2.5D constrained local model, combining a deformable 3D shape point model with 2D texture information to provide direct estimation of the pose parameters, avoiding the need for additional optimization strategies. It achieves this through an analytical derivation of a Jacobian matrix describing how changes in the parameters of the model create changes in the shape within the image through a full-perspective camera model. In addition, the model has very low computational complexity and can run in real-time on modern mobile devices such as tablets and laptops. The point distribution model of the face is built in a unique way, so as to minimize the effect of changes in facial expressions on the estimated head pose and hence make the solution more robust. Finally, the texture information is trained via local neural fields—a deep learning approach that utilizes small discriminative patches to exploit spatial relationships between the pixels and provide strong peaks at the optimal locations.











Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
\(\left||\varvec{q} \right||_{\varvec{\varLambda ^{-1}}}^{2}\) is shorthand for the squared Mahalanobis distance \(\varvec{q}^T \varvec{\varLambda }^{-1} \varvec{q}\)
References
Ackland, S., Istance, H., Coupland, S., & Vickers, S. (2014). An investigation into determining head pose for gaze estimation on unmodified mobile devices. In Proceedings of the symposium on eye tracking research and applications, ACM, pp. 203–206
Ariz, M., Bengoechea, J. J., Villanueva, A., & Cabeza, R. (2016). A novel 2d/3d database with automatic face annotation for head tracking and pose estimation. Computer Vision and Image Understanding, 148, 201–210.
Asthana, A., Zafeiriou, S., Tzimiropoulos, G., Cheng, S., Pantic, M., et al. (2015). From pixels to response maps: Discriminative image filtering for face alignment in the wild. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(6), 1312–1320.
Baltrušaitis, T., Robinson, P., & Morency, L. P. (2012). 3d constrained local model for rigid and non-rigid facial tracking. In IEEE conference on computer vision and pattern recognition (CVPR), 2012, IEEE, pp. 2610–2617
Baltrusaitis, T., Robinson, P., & Morency, L. P. (2013). Constrained local neural fields for robust facial landmark detection in the wild. In Proceedings of the IEEE international conference on computer vision workshops, pp. 354–361
Baltrušaitis, T., Robinson, P., & Morency, L. P. (2016). Openface: an open source facial behavior analysis toolkit. In IEEE winter conference on applications of computer vision (WACV), 2016, IEEE, pp. 1–10
Baltruvsaitis, T. (2014). Automatic facial expression analysis. Ph.D. thesis, University of Cambridge
Blanz, V., & Vetter, T. (1999). A morphable model for the synthesis of 3d faces. In: Proceedings of the 26th annual conference on computer graphics and interactive techniques, ACM Press/Addison-Wesley Publishing Co., pp. 187–194
Bolme, D. S., Beveridge, J. R., Draper, B. A., & Lui, Y. M. (2010). Visual object tracking using adaptive correlation filters. In IEEE conference on computer vision and pattern recognition (CVPR), 2010 IEEE, pp. 2544–2550
Bulat, A., & Tzimiropoulos, G. (2017). How far are we from solving the 2d & 3d face alignment problem?(and a dataset of 230,000 3d facial landmarks). In IEEE international conference on computer vision (ICCV), 2017 IEEE, pp. 1021–1030
Bulat, A., & Tzimiropoulos, G. (2018). Hierarchical binary cnns for landmark localization with limited resources. IEEE Transactions on Pattern Analysis and Machine Intelligence
Burgos-Artizzu, X. P., Perona, P., & Dollár, P. (2013). Robust face landmark estimation under occlusion. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1513–1520
Cao, X., Wei, Y., Wen, F., & Sun, J. (2014). Face alignment by explicit shape regression. International Journal of Computer Vision, 107(2), 177–190.
Ym, Cheung, & Peng, Q. (2015). Eye gaze tracking with a web camera in a desktop environment. IEEE Transactions on Human-Machine Systems, 45(4), 419–430.
Choi, S., & Kim, D. (2008). Robust head tracking using 3d ellipsoidal head model in particle filter. Pattern Recognition, 41(9), 2901–2915.
Cootes, T. F., Edwards, G. J., & Taylor, C. J. (2001). Active appearance models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(6), 681–685.
Cristinacce, D., & Cootes, T. (2006). Feature detection and tracking with constrained local models. Proc. British Machine Vision Conference, 3, 929–938.
Dementhon, D. F., & Davis, L. S. (1995). Model-based object pose in 25 lines of code. International Journal of Computer Vision, 15(1), 123–141.
Fanelli, G., Weise, T., Gall, J., & Van Gool, L. (2011). Real time head pose estimation from consumer depth cameras. Pattern Recognition pp. 101–110
Goodall, C. (1991). Procrustes methods in the statistical analysis of shape. Journal of the Royal Statistical Society Series B (Methodological) pp. 285–339
Gross, R., Matthews, I., Cohn, J., Kanade, T., & Baker, S. (2010). Multi-pie. Image and Vision Computing, 28(5), 807–813.
Henriques, J. F., Caseiro, R., Martins, P., & Batista, J. (2015). High-speed tracking with kernelized correlation filters. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(3), 583–596.
Kirby, M., & Sirovich, L. (1990). Application of the Karhunen–Loeve procedure for the characterization of human faces. IEEE Transactions on Pattern Analysis and Machine Intelligence, 12(1), 103–108.
La Cascia, M., Sclaroff, S., & Athitsos, V. (2000). Fast, reliable head tracking under varying illumination: An approach based on registration of texture-mapped 3d models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(4), 322–336.
Martins, P., Caseiro, R., & Batista, J. (2010). Face alignment through 2.5 d active appearance models. International Journal of Computer Vision, 56(1), 221–255.
Martins, P., Caseiro, R., & Batista, J. (2012). Generative face alignment through 2.5 d active appearance models. Computer Vision and Image Understanding
Merget, D., Rock, M., & Rigoll, G. (2018). Robust facial landmark detection via a fully-convolutional local-global context network. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 781–790
Padeleris, P., Zabulis, X., & Argyros, A. A. (2012). Head pose estimation on depth data based on particle swarm optimization. In IEEE computer society conference on computer vision and pattern recognition workshops (CVPRW), 2012 IEEE, pp. 42–49
Paquet, U. (2009). Convexity and bayesian constrained local models. In IEEE conference on computer vision and pattern recognition, 2009 CVPR 2009. IEEE, pp. 1193–1199
Paysan, P., Knothe, R., Amberg, B., Romdhani, S., & Vetter, T. (2009). A 3d face model for pose and illumination invariant face recognition. In Sixth IEEE international conference on advanced video and signal based surveillance, 2009. AVSS’09. IEEE, pp. 296–301
Pons-Moll, G., & Rosenhahn, B. (2011). Model-based pose estimation. In Visual analysis of humans, Springer, pp. 139–170
Rusinkiewicz, S., & Levoy, M. (2001). Efficient variants of the icp algorithm. In Proceedings third international conference on 3-D digital imaging and modeling, 2001. IEEE, pp. 145–152
Saragih, J., Lucey, S., & Cohn, J. (2011). Deformable model fitting by regularized landmark mean-shift. International Journal of Computer Vision, 91(2), 200–215.
Tang, Y., Sun, Z., & Tan, T. (2011). Real-time head pose estimation using random regression forests. Biometric Recognition pp. 66–73
Torresani, L., Hertzmann, A., & Bregler, C. (2008). Nonrigid structure-from-motion: Estimating shape and motion with hierarchical priors. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(5), 878–892.
Tulyakov, S., Jeni, L. A., Cohn, J. F., & Sebe, N. (2018). Viewpoint-consistent 3d face alignment. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(9), 2250–2264.
Viola, P., & Jones, M. (2001). Rapid object detection using a boosted cascade of simple features. In Proceedings of the 2001 IEEE computer society conference on computer vision and pattern recognition, 2001. CVPR 2001. IEEE, vol 1, pp. I–511
Wang, Y., Lucey, S., & Cohn, J.F. (2008). Enforcing convexity for improved alignment with constrained local models. In IEEE conference on, IEEE computer vision and pattern recognition, 2008. CVPR 2008. pp. 1–8
Weng, J., Cohen, P., Herniou, M., et al. (1992). Camera calibration with distortion models and accuracy evaluation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 14(10), 965–980.
Xiao, J., Moriyama, T., Kanade, T., & Cohn, J. (2003). Robust full-motion recovery of head by dynamic templates and re-registration techniques. International Journal of Imaging Systems and Technology, 13(1), 85–94.
Xiao, J., Baker, S., Matthews, I., & Kanade, T. (2004). Real-time combined 2d+ 3d active appearance models. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition, IEEE Computer Society; 1999, vol 2
Xiong, X., & De la Torre, F. (2013). Supervised descent method and its applications to face alignment. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 532–539
Zhu, X., Lei, Z., Liu, X., Shi, H., & Li, S.Z. (2016). Face alignment across large poses: A 3d solution. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 146–155
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Ackland, S., Chiclana, F., Istance, H. et al. Real-Time 3D Head Pose Tracking Through 2.5D Constrained Local Models with Local Neural Fields. Int J Comput Vis 127, 579–598 (2019). https://doi.org/10.1007/s11263-019-01152-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-019-01152-w