Skip to main content
Log in

Automatic Metadata Generation Through Analysis of Narration Within Instructional Videos

  • Transactional Processing Systems
  • Published:
Journal of Medical Systems Aims and scope Submit manuscript

Abstract

Current activity recognition based assistive living solutions have adopted relatively rigid models of inhabitant activities. These solutions have some deficiencies associated with the use of these models. To address this, a goal-oriented solution has been proposed. In a goal-oriented solution, goal models offer a method of flexibly modelling inhabitant activity. The flexibility of these goal models can dynamically produce a large number of varying action plans that may be used to guide inhabitants. In order to provide illustrative, video-based, instruction for these numerous actions plans, a number of video clips would need to be associated with each variation. To address this, rich metadata may be used to automatically match appropriate video clips from a video repository to each specific, dynamically generated, activity plan. This study introduces a mechanism of automatically generating suitable rich metadata representing actions depicted within video clips to facilitate such video matching. This performance of this mechanism was evaluated using eighteen video files; during this evaluation metadata was automatically generated with a high level of accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. Personal IADL Assistant, PIA – EU AAL Funded Research Project (AAL-2012-5-033), available at: http://www.pia-project.org/

  2. Near Field Communication – A short range contactless communication technology

References

  1. De Luca d’Alessandro, E., Bonacci, S., and Giraldi, G., Aging populations: the health and quality of life of the elderly. Clin. Ter. 162:e13, 2011.

    PubMed  Google Scholar 

  2. United Nations. World Population Ageing 2009 (Population Studies Series). 2010.

  3. Acampora, G., Cook, D. J., Rashidi, P., and Vasilakos, A. V., A survey on ambient intelligence in health care. Proc. IEEE. Inst. Electr. Electron. Eng. 101:2470–2494, 2013.

    Article  PubMed Central  PubMed  Google Scholar 

  4. Chen, L., Hoey, J., Nugent, C. D., Cook, D. J., Yu, Z., Systems, man, and cybernetics, Part C: Applications and reviews, IEEE Transactions on, 42(6):790–808, 2012.

  5. Lapointe, J., Bouchard, B., Bouchard, J., Smart homes for people with Alzheimer’s disease: adapting prompting strategies to the patient’s cognitive profile. In Proceedings of the 5th International Conference on PErvasive Technologies Related to Assistive Environments, p. 30, ACM, 2012.

  6. Chan, M., Estève, D., Escriba, C., and Campo, E., A review of smart homes- present state and future challenges. Comput. Methods Programs Biomed. 91:55–81, 2008.

    Article  PubMed  Google Scholar 

  7. Cook, D. J., and Das, S. K., How smart are our environments? An updated look at the state of the art. Pervasive Mob. Comput. 3:53–73, 2007.

    Article  Google Scholar 

  8. Mihailidis, A., Boger, J. N., Craig, T., and Hoey, J., The COACH prompting system to assist older adults with dementia through handwashing: an efficacy study. BMC Geriatr. 8:28, 2008.

    Article  PubMed Central  PubMed  Google Scholar 

  9. Rafferty, J., Chen, L., Nugent, C., Ontological goal modelling for proactive assistive living in smart environments. Ubiquitous Computing and Ambient Intelligence. Context-Awareness and Context-Driven Interaction. Springer International Publishing, 262–269, 2013.

  10. Filippova, K., Hall, K., Improved video categorization from text metadata and user comments. Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval, pp. 835–842, ACM, 2011.

  11. Papadopoulos, D. P., Kalogeiton, V. S., Chatzichristofis, S. A., and Papamarkos, N., Automatic summarization and annotation of videos with lack of metadata information. Expert Syst. Appl. 40:5765–5778, 2013.

    Article  Google Scholar 

  12. Ballan, L., Bertini, M., Bimbo, A., Seidenari, L., and Serra, G., Event detection and recognition for semantic annotation of video. Multimed. Tools Appl. 51:279–302, 2010.

    Article  Google Scholar 

  13. McCloskey, S., Davalos, P., Activity detection in the wild using video metadata. In: Pattern Recognition (ICPR), 2012 21st International Conference on pp. 3140–3143, IEEE, 2012.

  14. Perea-Ortega, J. M., Montejo-Ráez, A., Martín-Valdivia, M. T., and Ureña-López, L. A., Semantic tagging of video ASR transcripts using the web as a source of knowledge. Comput. Stand. Interfaces. 35:519–528, 2013.

    Article  Google Scholar 

  15. Metze, F., Ding, D., Younessian, E., and Hauptmann, A., Beyond audio and video retrieval: topic-oriented multimedia summarization. Int. J. Multimed. Inf. Retr. 2:131–144, 2013.

    Article  Google Scholar 

  16. Lawton, M., Brody, E., Instrumental Activities of Daily Living Scale (IADL). 1988.

  17. Rafferty, J., Nugent, C., Chen, L., Qi, J., Dutton, R., Zirk, A., Boye, L. T., Kohn, M., Hellman, R., NFC based provisioning of instructional videos to assist with instrumental activities of daily living. In: Engineering in Medicine and Biology Society (EMBC), 2014 36th Annual International Conference of the IEEE, pp. 4131–4134, IEEE, 2014.

  18. Mehla, R., and Aggarwal, R., Automatic Speech Recognition: A survey. International Journal of Advanced Research in Computer Science and Electronics Engineering (IJARCSEE). 3(1):45, 2014.

    Google Scholar 

  19. FFMPEG. https://www.ffmpeg.org/.

  20. Google. Google Speech API, http://www.google.com/speech-api/v1/recognize.

  21. Dice, L. R., Measures of the amount of ecologic association between species. Ecology 26:297–302, 1945.

    Article  Google Scholar 

  22. Lee, L., On the effectiveness of the skew divergence for statistical language analysis. AISTATS Artificial Intell. Stat. :65–72, 2001.

  23. Cohen, W. W., Ravikumar, P. D., Fienberg, S. E., A comparison of string distance metrics for name-matching tasks, Proceedings of the IJCAI-2003 Workshop on Information Integration on the Web, p. 73–78. 2003.

  24. Chen, W., Ananthakrishnan, S., ASR error detection in a conversational spoken language translation system. In: Acoustics, Speech and Signal Processing (ICASSP), 2013 I.E. International Conference on pp. 7418–7422, IEEE, 2013.

  25. SIL. American English Homophones, http://www-01.sil.org/linguistics/wordlists/english/.

  26. Princeton University: About WordNet., http://wordnet.princeton.edu.

  27. Apache: Lucene, http://lucene.apache.org.

Download references

Acknowledgments

This work has been conducted in the context of the EU AAL PIA project (AAL-2012-5-033). The authors gratefully acknowledge the contributions from all members of the PIA consortium.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Joseph Rafferty.

Additional information

This article is part of the Topical Collection on Transactional Processing Systems

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Rafferty, J., Nugent, C., Liu, J. et al. Automatic Metadata Generation Through Analysis of Narration Within Instructional Videos. J Med Syst 39, 94 (2015). https://doi.org/10.1007/s10916-015-0295-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10916-015-0295-2

Keywords