Skip to main content
Log in

Real-Time 3D Head Pose Tracking Through 2.5D Constrained Local Models with Local Neural Fields

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Tracking the head in a video stream is a common thread seen within computer vision literature, supplying the research community with a large number of challenging and interesting problems. Head pose estimation from monocular cameras is often considered an extended application after the face tracking task has already been performed. This often involves passing the resultant 2D data through a simpler algorithm that best fits the data to a static 3D model to determine the 3D pose estimate. This work describes the 2.5D constrained local model, combining a deformable 3D shape point model with 2D texture information to provide direct estimation of the pose parameters, avoiding the need for additional optimization strategies. It achieves this through an analytical derivation of a Jacobian matrix describing how changes in the parameters of the model create changes in the shape within the image through a full-perspective camera model. In addition, the model has very low computational complexity and can run in real-time on modern mobile devices such as tablets and laptops. The point distribution model of the face is built in a unique way, so as to minimize the effect of changes in facial expressions on the estimated head pose and hence make the solution more robust. Finally, the texture information is trained via local neural fields—a deep learning approach that utilizes small discriminative patches to exploit spatial relationships between the pixels and provide strong peaks at the optimal locations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Notes

  1. \(\left||\varvec{q} \right||_{\varvec{\varLambda ^{-1}}}^{2}\) is shorthand for the squared Mahalanobis distance \(\varvec{q}^T \varvec{\varLambda }^{-1} \varvec{q}\)

References

  • Ackland, S., Istance, H., Coupland, S., & Vickers, S. (2014). An investigation into determining head pose for gaze estimation on unmodified mobile devices. In Proceedings of the symposium on eye tracking research and applications, ACM, pp. 203–206

  • Ariz, M., Bengoechea, J. J., Villanueva, A., & Cabeza, R. (2016). A novel 2d/3d database with automatic face annotation for head tracking and pose estimation. Computer Vision and Image Understanding, 148, 201–210.

    Article  Google Scholar 

  • Asthana, A., Zafeiriou, S., Tzimiropoulos, G., Cheng, S., Pantic, M., et al. (2015). From pixels to response maps: Discriminative image filtering for face alignment in the wild. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(6), 1312–1320.

    Article  Google Scholar 

  • Baltrušaitis, T., Robinson, P., & Morency, L. P. (2012). 3d constrained local model for rigid and non-rigid facial tracking. In IEEE conference on computer vision and pattern recognition (CVPR), 2012, IEEE, pp. 2610–2617

  • Baltrusaitis, T., Robinson, P., & Morency, L. P. (2013). Constrained local neural fields for robust facial landmark detection in the wild. In Proceedings of the IEEE international conference on computer vision workshops, pp. 354–361

  • Baltrušaitis, T., Robinson, P., & Morency, L. P. (2016). Openface: an open source facial behavior analysis toolkit. In IEEE winter conference on applications of computer vision (WACV), 2016, IEEE, pp. 1–10

  • Baltruvsaitis, T. (2014). Automatic facial expression analysis. Ph.D. thesis, University of Cambridge

  • Blanz, V., & Vetter, T. (1999). A morphable model for the synthesis of 3d faces. In: Proceedings of the 26th annual conference on computer graphics and interactive techniques, ACM Press/Addison-Wesley Publishing Co., pp. 187–194

  • Bolme, D. S., Beveridge, J. R., Draper, B. A., & Lui, Y. M. (2010). Visual object tracking using adaptive correlation filters. In IEEE conference on computer vision and pattern recognition (CVPR), 2010 IEEE, pp. 2544–2550

  • Bulat, A., & Tzimiropoulos, G. (2017). How far are we from solving the 2d & 3d face alignment problem?(and a dataset of 230,000 3d facial landmarks). In IEEE international conference on computer vision (ICCV), 2017 IEEE, pp. 1021–1030

  • Bulat, A., & Tzimiropoulos, G. (2018). Hierarchical binary cnns for landmark localization with limited resources. IEEE Transactions on Pattern Analysis and Machine Intelligence

  • Burgos-Artizzu, X. P., Perona, P., & Dollár, P. (2013). Robust face landmark estimation under occlusion. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1513–1520

  • Cao, X., Wei, Y., Wen, F., & Sun, J. (2014). Face alignment by explicit shape regression. International Journal of Computer Vision, 107(2), 177–190.

    Article  MathSciNet  Google Scholar 

  • Ym, Cheung, & Peng, Q. (2015). Eye gaze tracking with a web camera in a desktop environment. IEEE Transactions on Human-Machine Systems, 45(4), 419–430.

    Article  Google Scholar 

  • Choi, S., & Kim, D. (2008). Robust head tracking using 3d ellipsoidal head model in particle filter. Pattern Recognition, 41(9), 2901–2915.

    Article  MATH  Google Scholar 

  • Cootes, T. F., Edwards, G. J., & Taylor, C. J. (2001). Active appearance models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(6), 681–685.

    Article  Google Scholar 

  • Cristinacce, D., & Cootes, T. (2006). Feature detection and tracking with constrained local models. Proc. British Machine Vision Conference, 3, 929–938.

    MATH  Google Scholar 

  • Dementhon, D. F., & Davis, L. S. (1995). Model-based object pose in 25 lines of code. International Journal of Computer Vision, 15(1), 123–141.

    Article  Google Scholar 

  • Fanelli, G., Weise, T., Gall, J., & Van Gool, L. (2011). Real time head pose estimation from consumer depth cameras. Pattern Recognition pp. 101–110

  • Goodall, C. (1991). Procrustes methods in the statistical analysis of shape. Journal of the Royal Statistical Society Series B (Methodological) pp. 285–339

  • Gross, R., Matthews, I., Cohn, J., Kanade, T., & Baker, S. (2010). Multi-pie. Image and Vision Computing, 28(5), 807–813.

    Article  Google Scholar 

  • Henriques, J. F., Caseiro, R., Martins, P., & Batista, J. (2015). High-speed tracking with kernelized correlation filters. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(3), 583–596.

    Article  Google Scholar 

  • Kirby, M., & Sirovich, L. (1990). Application of the Karhunen–Loeve procedure for the characterization of human faces. IEEE Transactions on Pattern Analysis and Machine Intelligence, 12(1), 103–108.

    Article  Google Scholar 

  • La Cascia, M., Sclaroff, S., & Athitsos, V. (2000). Fast, reliable head tracking under varying illumination: An approach based on registration of texture-mapped 3d models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(4), 322–336.

    Article  Google Scholar 

  • Martins, P., Caseiro, R., & Batista, J. (2010). Face alignment through 2.5 d active appearance models. International Journal of Computer Vision, 56(1), 221–255.

    Google Scholar 

  • Martins, P., Caseiro, R., & Batista, J. (2012). Generative face alignment through 2.5 d active appearance models. Computer Vision and Image Understanding

  • Merget, D., Rock, M., & Rigoll, G. (2018). Robust facial landmark detection via a fully-convolutional local-global context network. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 781–790

  • Padeleris, P., Zabulis, X., & Argyros, A. A. (2012). Head pose estimation on depth data based on particle swarm optimization. In IEEE computer society conference on computer vision and pattern recognition workshops (CVPRW), 2012 IEEE, pp. 42–49

  • Paquet, U. (2009). Convexity and bayesian constrained local models. In IEEE conference on computer vision and pattern recognition, 2009 CVPR 2009. IEEE, pp. 1193–1199

  • Paysan, P., Knothe, R., Amberg, B., Romdhani, S., & Vetter, T. (2009). A 3d face model for pose and illumination invariant face recognition. In Sixth IEEE international conference on advanced video and signal based surveillance, 2009. AVSS’09. IEEE, pp. 296–301

  • Pons-Moll, G., & Rosenhahn, B. (2011). Model-based pose estimation. In Visual analysis of humans, Springer, pp. 139–170

  • Rusinkiewicz, S., & Levoy, M. (2001). Efficient variants of the icp algorithm. In Proceedings third international conference on 3-D digital imaging and modeling, 2001. IEEE, pp. 145–152

  • Saragih, J., Lucey, S., & Cohn, J. (2011). Deformable model fitting by regularized landmark mean-shift. International Journal of Computer Vision, 91(2), 200–215.

    Article  MathSciNet  MATH  Google Scholar 

  • Tang, Y., Sun, Z., & Tan, T. (2011). Real-time head pose estimation using random regression forests. Biometric Recognition pp. 66–73

  • Torresani, L., Hertzmann, A., & Bregler, C. (2008). Nonrigid structure-from-motion: Estimating shape and motion with hierarchical priors. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(5), 878–892.

    Article  Google Scholar 

  • Tulyakov, S., Jeni, L. A., Cohn, J. F., & Sebe, N. (2018). Viewpoint-consistent 3d face alignment. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(9), 2250–2264.

    Article  Google Scholar 

  • Viola, P., & Jones, M. (2001). Rapid object detection using a boosted cascade of simple features. In Proceedings of the 2001 IEEE computer society conference on computer vision and pattern recognition, 2001. CVPR 2001. IEEE, vol 1, pp. I–511

  • Wang, Y., Lucey, S., & Cohn, J.F. (2008). Enforcing convexity for improved alignment with constrained local models. In IEEE conference on, IEEE computer vision and pattern recognition, 2008. CVPR 2008. pp. 1–8

  • Weng, J., Cohen, P., Herniou, M., et al. (1992). Camera calibration with distortion models and accuracy evaluation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 14(10), 965–980.

    Article  Google Scholar 

  • Xiao, J., Moriyama, T., Kanade, T., & Cohn, J. (2003). Robust full-motion recovery of head by dynamic templates and re-registration techniques. International Journal of Imaging Systems and Technology, 13(1), 85–94.

    Article  Google Scholar 

  • Xiao, J., Baker, S., Matthews, I., & Kanade, T. (2004). Real-time combined 2d+ 3d active appearance models. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition, IEEE Computer Society; 1999, vol 2

  • Xiong, X., & De la Torre, F. (2013). Supervised descent method and its applications to face alignment. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 532–539

  • Zhu, X., Lei, Z., Liu, X., Shi, H., & Li, S.Z. (2016). Face alignment across large poses: A 3d solution. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 146–155

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Stephen Ackland.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ackland, S., Chiclana, F., Istance, H. et al. Real-Time 3D Head Pose Tracking Through 2.5D Constrained Local Models with Local Neural Fields. Int J Comput Vis 127, 579–598 (2019). https://doi.org/10.1007/s11263-019-01152-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-019-01152-w

Keywords