Real-Time 3D Head Pose Tracking Through 2.5D Constrained Local Models with Local Neural Fields

Ackland, Stephen; Chiclana, Francisco; Istance, Howell; Coupland, Simon

doi:10.1007/s11263-019-01152-w

Real-Time 3D Head Pose Tracking Through 2.5D Constrained Local Models with Local Neural Fields

Published: 04 March 2019

Volume 127, pages 579–598, (2019)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Stephen Ackland ORCID: orcid.org/0000-0001-9568-2531¹,
Francisco Chiclana¹,
Howell Istance² &
…
Simon Coupland¹

929 Accesses
3 Citations
Explore all metrics

Abstract

Tracking the head in a video stream is a common thread seen within computer vision literature, supplying the research community with a large number of challenging and interesting problems. Head pose estimation from monocular cameras is often considered an extended application after the face tracking task has already been performed. This often involves passing the resultant 2D data through a simpler algorithm that best fits the data to a static 3D model to determine the 3D pose estimate. This work describes the 2.5D constrained local model, combining a deformable 3D shape point model with 2D texture information to provide direct estimation of the pose parameters, avoiding the need for additional optimization strategies. It achieves this through an analytical derivation of a Jacobian matrix describing how changes in the parameters of the model create changes in the shape within the image through a full-perspective camera model. In addition, the model has very low computational complexity and can run in real-time on modern mobile devices such as tablets and laptops. The point distribution model of the face is built in a unique way, so as to minimize the effect of changes in facial expressions on the estimated head pose and hence make the solution more robust. Finally, the texture information is trained via local neural fields—a deep learning approach that utilizes small discriminative patches to exploit spatial relationships between the pixels and provide strong peaks at the optimal locations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

Evaluation of Camera Pose Estimation Using Human Head Pose Estimation

Article Open access 30 March 2023

RealHePoNet: a robust single-stage ConvNet for head pose estimation in the wild

Article 20 November 2020

Fast and Precise Face Alignment and 3D Shape Reconstruction from a Single 2D Image

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

$\left||\varvec{q} \right||_{\varvec{\varLambda ^{-1}}}^{2}$ is shorthand for the squared Mahalanobis distance $\varvec{q}^T \varvec{\varLambda }^{-1} \varvec{q}$

References

Ackland, S., Istance, H., Coupland, S., & Vickers, S. (2014). An investigation into determining head pose for gaze estimation on unmodified mobile devices. In Proceedings of the symposium on eye tracking research and applications, ACM, pp. 203–206
Ariz, M., Bengoechea, J. J., Villanueva, A., & Cabeza, R. (2016). A novel 2d/3d database with automatic face annotation for head tracking and pose estimation. Computer Vision and Image Understanding, 148, 201–210.
Article Google Scholar
Asthana, A., Zafeiriou, S., Tzimiropoulos, G., Cheng, S., Pantic, M., et al. (2015). From pixels to response maps: Discriminative image filtering for face alignment in the wild. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(6), 1312–1320.
Article Google Scholar
Baltrušaitis, T., Robinson, P., & Morency, L. P. (2012). 3d constrained local model for rigid and non-rigid facial tracking. In IEEE conference on computer vision and pattern recognition (CVPR), 2012, IEEE, pp. 2610–2617
Baltrusaitis, T., Robinson, P., & Morency, L. P. (2013). Constrained local neural fields for robust facial landmark detection in the wild. In Proceedings of the IEEE international conference on computer vision workshops, pp. 354–361
Baltrušaitis, T., Robinson, P., & Morency, L. P. (2016). Openface: an open source facial behavior analysis toolkit. In IEEE winter conference on applications of computer vision (WACV), 2016, IEEE, pp. 1–10
Baltruvsaitis, T. (2014). Automatic facial expression analysis. Ph.D. thesis, University of Cambridge
Blanz, V., & Vetter, T. (1999). A morphable model for the synthesis of 3d faces. In: Proceedings of the 26th annual conference on computer graphics and interactive techniques, ACM Press/Addison-Wesley Publishing Co., pp. 187–194
Bolme, D. S., Beveridge, J. R., Draper, B. A., & Lui, Y. M. (2010). Visual object tracking using adaptive correlation filters. In IEEE conference on computer vision and pattern recognition (CVPR), 2010 IEEE, pp. 2544–2550
Bulat, A., & Tzimiropoulos, G. (2017). How far are we from solving the 2d & 3d face alignment problem?(and a dataset of 230,000 3d facial landmarks). In IEEE international conference on computer vision (ICCV), 2017 IEEE, pp. 1021–1030
Bulat, A., & Tzimiropoulos, G. (2018). Hierarchical binary cnns for landmark localization with limited resources. IEEE Transactions on Pattern Analysis and Machine Intelligence
Burgos-Artizzu, X. P., Perona, P., & Dollár, P. (2013). Robust face landmark estimation under occlusion. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1513–1520
Cao, X., Wei, Y., Wen, F., & Sun, J. (2014). Face alignment by explicit shape regression. International Journal of Computer Vision, 107(2), 177–190.
Article MathSciNet Google Scholar
Ym, Cheung, & Peng, Q. (2015). Eye gaze tracking with a web camera in a desktop environment. IEEE Transactions on Human-Machine Systems, 45(4), 419–430.
Article Google Scholar
Choi, S., & Kim, D. (2008). Robust head tracking using 3d ellipsoidal head model in particle filter. Pattern Recognition, 41(9), 2901–2915.
Article MATH Google Scholar
Cootes, T. F., Edwards, G. J., & Taylor, C. J. (2001). Active appearance models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(6), 681–685.
Article Google Scholar
Cristinacce, D., & Cootes, T. (2006). Feature detection and tracking with constrained local models. Proc. British Machine Vision Conference, 3, 929–938.
MATH Google Scholar
Dementhon, D. F., & Davis, L. S. (1995). Model-based object pose in 25 lines of code. International Journal of Computer Vision, 15(1), 123–141.
Article Google Scholar
Fanelli, G., Weise, T., Gall, J., & Van Gool, L. (2011). Real time head pose estimation from consumer depth cameras. Pattern Recognition pp. 101–110
Goodall, C. (1991). Procrustes methods in the statistical analysis of shape. Journal of the Royal Statistical Society Series B (Methodological) pp. 285–339
Gross, R., Matthews, I., Cohn, J., Kanade, T., & Baker, S. (2010). Multi-pie. Image and Vision Computing, 28(5), 807–813.
Article Google Scholar
Henriques, J. F., Caseiro, R., Martins, P., & Batista, J. (2015). High-speed tracking with kernelized correlation filters. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(3), 583–596.
Article Google Scholar
Kirby, M., & Sirovich, L. (1990). Application of the Karhunen–Loeve procedure for the characterization of human faces. IEEE Transactions on Pattern Analysis and Machine Intelligence, 12(1), 103–108.
Article Google Scholar
La Cascia, M., Sclaroff, S., & Athitsos, V. (2000). Fast, reliable head tracking under varying illumination: An approach based on registration of texture-mapped 3d models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(4), 322–336.
Article Google Scholar
Martins, P., Caseiro, R., & Batista, J. (2010). Face alignment through 2.5 d active appearance models. International Journal of Computer Vision, 56(1), 221–255.
Google Scholar
Martins, P., Caseiro, R., & Batista, J. (2012). Generative face alignment through 2.5 d active appearance models. Computer Vision and Image Understanding
Merget, D., Rock, M., & Rigoll, G. (2018). Robust facial landmark detection via a fully-convolutional local-global context network. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 781–790
Padeleris, P., Zabulis, X., & Argyros, A. A. (2012). Head pose estimation on depth data based on particle swarm optimization. In IEEE computer society conference on computer vision and pattern recognition workshops (CVPRW), 2012 IEEE, pp. 42–49
Paquet, U. (2009). Convexity and bayesian constrained local models. In IEEE conference on computer vision and pattern recognition, 2009 CVPR 2009. IEEE, pp. 1193–1199
Paysan, P., Knothe, R., Amberg, B., Romdhani, S., & Vetter, T. (2009). A 3d face model for pose and illumination invariant face recognition. In Sixth IEEE international conference on advanced video and signal based surveillance, 2009. AVSS’09. IEEE, pp. 296–301
Pons-Moll, G., & Rosenhahn, B. (2011). Model-based pose estimation. In Visual analysis of humans, Springer, pp. 139–170
Rusinkiewicz, S., & Levoy, M. (2001). Efficient variants of the icp algorithm. In Proceedings third international conference on 3-D digital imaging and modeling, 2001. IEEE, pp. 145–152
Saragih, J., Lucey, S., & Cohn, J. (2011). Deformable model fitting by regularized landmark mean-shift. International Journal of Computer Vision, 91(2), 200–215.
Article MathSciNet MATH Google Scholar
Tang, Y., Sun, Z., & Tan, T. (2011). Real-time head pose estimation using random regression forests. Biometric Recognition pp. 66–73
Torresani, L., Hertzmann, A., & Bregler, C. (2008). Nonrigid structure-from-motion: Estimating shape and motion with hierarchical priors. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(5), 878–892.
Article Google Scholar
Tulyakov, S., Jeni, L. A., Cohn, J. F., & Sebe, N. (2018). Viewpoint-consistent 3d face alignment. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(9), 2250–2264.
Article Google Scholar
Viola, P., & Jones, M. (2001). Rapid object detection using a boosted cascade of simple features. In Proceedings of the 2001 IEEE computer society conference on computer vision and pattern recognition, 2001. CVPR 2001. IEEE, vol 1, pp. I–511
Wang, Y., Lucey, S., & Cohn, J.F. (2008). Enforcing convexity for improved alignment with constrained local models. In IEEE conference on, IEEE computer vision and pattern recognition, 2008. CVPR 2008. pp. 1–8
Weng, J., Cohen, P., Herniou, M., et al. (1992). Camera calibration with distortion models and accuracy evaluation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 14(10), 965–980.
Article Google Scholar
Xiao, J., Moriyama, T., Kanade, T., & Cohn, J. (2003). Robust full-motion recovery of head by dynamic templates and re-registration techniques. International Journal of Imaging Systems and Technology, 13(1), 85–94.
Article Google Scholar
Xiao, J., Baker, S., Matthews, I., & Kanade, T. (2004). Real-time combined 2d+ 3d active appearance models. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition, IEEE Computer Society; 1999, vol 2
Xiong, X., & De la Torre, F. (2013). Supervised descent method and its applications to face alignment. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 532–539
Zhu, X., Lei, Z., Liu, X., Shi, H., & Li, S.Z. (2016). Face alignment across large poses: A 3d solution. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 146–155

Download references

Author information

Authors and Affiliations

De Montfort University, Leicester, UK
Stephen Ackland, Francisco Chiclana & Simon Coupland
University of Tampere, Tampere, Finland
Howell Istance

Authors

Stephen Ackland
View author publications
You can also search for this author inPubMed Google Scholar
Francisco Chiclana
View author publications
You can also search for this author inPubMed Google Scholar
Howell Istance
View author publications
You can also search for this author inPubMed Google Scholar
Simon Coupland
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Stephen Ackland.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ackland, S., Chiclana, F., Istance, H. et al. Real-Time 3D Head Pose Tracking Through 2.5D Constrained Local Models with Local Neural Fields. Int J Comput Vis 127, 579–598 (2019). https://doi.org/10.1007/s11263-019-01152-w

Download citation

Received: 22 February 2018
Accepted: 17 January 2019
Published: 04 March 2019
Issue Date: 01 June 2019
DOI: https://doi.org/10.1007/s11263-019-01152-w

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Real-Time 3D Head Pose Tracking Through 2.5D Constrained Local Models with Local Neural Fields

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Evaluation of Camera Pose Estimation Using Human Head Pose Estimation

RealHePoNet: a robust single-stage ConvNet for head pose estimation in the wild

Fast and Precise Face Alignment and 3D Shape Reconstruction from a Single 2D Image

Explore related subjects

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now