Lokalisierung, Verfolgung und Modellierung von Lippen zur audio-visuellen Spracherkennung

Vogt, M.; Sommerau, M.; Mamier, G.; Levi, P.

doi:10.1007/978-3-642-60893-3_50

M. Vogt³,
M. Sommerau³,
G. Mamier³ &
…
P. Levi³

Part of the book series: Informatik aktuell ((INFORMAT))

160 Accesses

Zusammenfassung

Ansätze zur audio-visuellen Spracherkennung erfordern eine robuste, aktive, visuelle Verfolgung der Lippenregion, sowie eine umfassende Extraktion visueller Lippenmerkmale. Der vorgestellte Ansatz arbeitet unter natürlichen Aufnahmebedingungen. Eine gleichmäßige Lippenverfolgung wird durch einen kinetisch modellierten Nick-Schwenk-Kopf und eine neuronale Regelung erzielt. Die Gewinnung von Lippenmerkmalen erfolgt über eine spezielle Sprache, die situationsabhängig verschiedene Konfigurationen eines Lippenmodells aktiviert. Aufgrund des Sprachansatzes ergibt sich ein hochflexibles System für den Entwurf und Test unterschiedlicher Lippenmodelle.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 44.99; Price excludes VAT (USA)

Softcover Book: USD 59.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Audioforensik

Modellierungsansatz zur Abbildung gesamtmotorischer Reibungsverluste

Ladungswechsel

Literatur

Alvarez Diez, Adrian: Entwurf und Implementierung eines Sprach-Dialogsystems zur Fernsteuerung einer graphischen Benutzeroberfläche, Universität Stuttgart, Fakultät Informatik, Studienarbeit Nr. 1538, 1996
Google Scholar
Blake, A.; Yuille, A: Active Vision, MIT Press, 1992
Google Scholar
Bothe, Hans-H.: Artificial visual speech, synchronized with a speech synthesis system, In: Lecture Notes in Computer Science, Vol. 860, W.L. Zagler, G. Busbyund, R.R. Wagner (eds.), 1994
Google Scholar
Chandramohan, Devi; Silsbee, Peter L.: A Multiple Deformable Template Approach for Visual Speech Recognition, Proc. of the 4th International Conference on Spoken Language Processing, ICSLP 96, 1996
Google Scholar
Cripe, B.E.; Brewster, J.A.; Laursen, D.E.: A Common Desktop Environment for Platforms Based on the UNIX Operating System, Hewlett-Packard Journal, Vol. 47, No. 2, 1996
Google Scholar
Fausel, Frank: Robuste Einzelworterkennung für ein Sprach-Dialogsystem mit kleinem Wortschatz, Universität Stuttgart, Fakultät Informatik, Studienarbeit Nr. 1563, 1996
Google Scholar
Guiard-Marigny, Thierry; Adjoudani, Ali; Benoît, Christian: A 3D model of the lips and of the jaw for visual speech synthesis, In: Progress in Speech Synthesis, Springer-Verlag, J. van Sauten et al. (ed.), 1996
Google Scholar
Hennecke, Marcus E.; Stork, David G.; Prasad, K. Venkatesh: Visionary Speech: Looking Ahead to Practical Speechreading Systems, In: Speechreading by Humans and Machines, NATO ASI Series F, Vol. 150, Springer Verlag, D.G. Stork, M.E. Hennecke (eds.), 1996
Google Scholar
Narendra, K.; Mukhopadhyay, S.: Adaptive Control of Nonlinear Multivariable Systems Using Neural Networks, Neural Networks, Vol. 7, No. 5, pp. 737–752, 1994
Article MATH Google Scholar
Peterson, C.E.: A Neural Control for the Stereo Head System (Pan-Tilt Unit) for a Robot, Universität Stuttgart, Fakultät Informatik, Studienarbeit Nr. 1462, 1995
Google Scholar
Sommerau, M.; Mamier, G.; Zell, A.; Vogt, M.; Levi, P.: Fast Face Localization and Tracking with Model-Based Time Synchronization of a Head System, In: Mustererkennung 1995, 17. DAGM-Symposium, Springer Verlag, G. Sagerer et al. (ed.), 1995
Google Scholar
Vogt, Michael: Fast Matching of a Dynamic Lip Model to Color Video Sequences under Regular Illumination Conditions, In: Speechreading by Humans and Machines, NATO ASI Series F, Vol. 150, Springer Verlag, D.G. Stork, M.E. Hennecke (eds.), 1996
Google Scholar
Vogt, Michael: Interpreted Multi-State Lip Models for Audio-Visual Speech Recognition, In: European Tutorial & Research Workshop on Audio-Visual Speech Processing: Computational and Cognitive Science Approaches, ESCA-ESCOP, Rhodes, Greece (to appear), C. Benoît, R. Campbell (eds.), 1997
Google Scholar
Yuille, A.; Cohen, D.; Hallinan, P.: Facial feature extraction by deformable templates, Harvard Robotics Laboratory, Technical Report No. 88–2, 1988
Google Scholar

Download references

Author information

Authors and Affiliations

Praktische Informatik – Bildverstehen, Universität Stuttgart, Fakultät Informatik, IPVR, Breitwiesenstr. 20-22, D-70565, Stuttgart, Germany
M. Vogt, M. Sommerau, G. Mamier & P. Levi

Authors

M. Vogt
View author publications
You can also search for this author in PubMed Google Scholar
M. Sommerau
View author publications
You can also search for this author in PubMed Google Scholar
G. Mamier
View author publications
You can also search for this author in PubMed Google Scholar
P. Levi
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institut für Nachrichtentechnik, Technische Universität Braunschweig, Schleinitzstrasse 22, D-38092, Braunschweig, Deutschland
Erwin Paulus
Institut für Robotik und Prozeßinformatik, Technische Universität Braunschweig, Hamburger Strasse 267, D-38114, Braunschweig, Deutschland
Friedrich M. Wahl

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Vogt, M., Sommerau, M., Mamier, G., Levi, P. (1997). Lokalisierung, Verfolgung und Modellierung von Lippen zur audio-visuellen Spracherkennung. In: Paulus, E., Wahl, F.M. (eds) Mustererkennung 1997. Informatik aktuell. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-60893-3_50

Download citation

DOI: https://doi.org/10.1007/978-3-642-60893-3_50
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-63426-3
Online ISBN: 978-3-642-60893-3
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics