Pruning Terminology Extracted from a Specialized Corpus for CV Ontology Acquisition

Roche, Mathieu; Kodratoff, Yves

doi:10.1007/11915072_13

Mathieu Roche¹⁹ &
Yves Kodratoff²⁰

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4278))

Included in the following conference series:

OTM Confederated International Conferences "On the Move to Meaningful Internet Systems"

1034 Accesses

Abstract

This paper presents an experimental study for extracting a terminology from a corpus made of Curriculum Vitae (CV). This terminology is to be used for ontology acquisition. The choice of the pruning rate of the terminology is crucial relative to the quality of the ontology acquired. In this paper, we investigate this pruning rate by using several evaluation measures (precision, recall, F-measure, and ROC curve).

An erratum to this chapter can be found at http://dx.doi.org/10.1007/11915072_109.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

General Terminology Induction in OWL

Enriching WordNet with Subject Specific Out of Vocabulary Terms Using Existing Ontology

A Web Based Cooperation Tool for Evaluating Standardized Curricula Using Ontology Mapping

References

Amrani, A., Kodratoff, Y., Matte-Tailliez, O.: A semi-automatic system for tagging specialized corpora. In: Dai, H., Srikant, R., Zhang, C. (eds.) PAKDD 2004. LNCS (LNAI), vol. 3056, pp. 670–681. Springer, Heidelberg (2004)
Chapter Google Scholar
Aussenac-Gilles, N., Bourigault, D.: Construction d’ontologies à partir de textes. In: Actes de TALN 2003, vol. 2, pp. 27–47 (2003)
Google Scholar
Bourigault, D., Jacquemin, C.: Term extraction + term clustering: An integrated platform for computer-aided terminology. In: Proceedings of EACL 1999, Bergen., pp. 15–22 (1999)
Google Scholar
Brill, E.: Some advances in transformation-based part of speech tagging. In: AAAI, vol. 1, pp. 722–727 (1994)
Google Scholar
Daille, B.: Study and Implementation of Combined Techniques for Automatic Extraction of Terminology. In: Resnik, P., Klavans, J. (eds.) The Balancing Act: Combining Symbolic and Statistical Approaches to Language, pp. 49–66. MIT Press, Cambridge (1996)
Google Scholar
David, S., Plante, P.: De la nécessité d’une approche morpho syntaxique dans l’analyse de textes. In: Intelligence Artificielle et Sciences Cognitives au Quebec, vol. 3, pp. 140–154 (1990)
Google Scholar
Dunning, T.E.: Accurate methods for the statistics of surprise and coincidence. Computational Linguistics 19(1), 61–74 (1993)
Google Scholar
Evans, D.A., Zhai, C.: Noun-phrase analysis in unrestricted text for information retrieval. In: Proceedings of ACL, Santa Cruz, US, pp. 17–24 (1996)
Google Scholar
Fabre, C., Bourigault, D.: Linguistic clues for corpus-based acquisition of lexical dependencies. In: Corpus Linguistics, Lancaster, pp. 176–184 (2001)
Google Scholar
Ferri, C., Flach, P., Hernandez-Orallo, J.: Learning decision trees using the area under the ROC curve. In: Proceedings of ICML 2002, pp. 139–146 (2002)
Google Scholar
Halliday, M.A.K.: System and Function in Language. Oxford University Press, London (1976)
Google Scholar
Jacquemin, C.: Variation terminologique : Reconnaissance et acquisition automatiques de termes et de leurs variantes en corpus. PhD thesis, Mémoire d’Habilitation à Diriger des Recherches en informatique fondamentale, Université de Nantes (1997)
Google Scholar
Roche, M.: Intégration de la construction de la terminologie de domaines spécialisés dans un processus global de fouille de textes. PhD thesis, Université de Paris, Décembre 11 (2004)
Google Scholar
Roche, M., Azé, J., Kodratoff, Y., Sebag, M.: Learning interestingness measures in terminology extraction. A ROC-based approach. In: Proceedings of ”ROC Analysis in AI” Workshop (ECAI 2004), Valencia, Spain, pp. 81–88 (2004)
Google Scholar
Roche, M., Heitz, T., Matte-Tailliez, O., Kodratoff, Y.: Exit: Un système itératif pour l’extraction de la terminologie du domaine à partir de corpus spécialisés. In: Proceedings of JADT 2004, vol. 2, pp. 946–956 (2004)
Google Scholar
Shamsfard, M., Barforoush, A.A.: The state of the art in ontology learning: a framework for comparison. The Knowledge Engineering Review 18(4), 293–316 (2003)
Article Google Scholar
Smadja, F.: Retrieving collocations from text: Xtract. Computational Linguistics 19(1), 143–177 (1993)
Google Scholar
Thanopoulos, A., Fakotakis, N., Kokkianakis, G.: Comparative Evaluation of Collocation Extraction Metrics. In: Proceedings of LREC 2002, vol. 2, pp. 620–625 (2002)
Google Scholar
Van Risbergen, C.J.: Information Retrieval, 2nd edn., London, Butterworths (1979)
Google Scholar
Yan, L., Dodier, R.H., Mozer, M., Wolniewicz, R.H.: Optimizing classifier performance via an approximation to the Wilcoxon-Mann-Whitney statistic. In: Proceedings of ICML 2003, pp. 848–855 (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

LIRMM – UMR 5506, Université Montpellier 2, 34392, Montpellier Cedex 5, France
Mathieu Roche
LRI – UMR 8623, Université Paris-Sud, 91405, Orsay Cedex, France
Yves Kodratoff

Authors

Mathieu Roche
View author publications
You can also search for this author in PubMed Google Scholar
Yves Kodratoff
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

STARLab, Vrije Universiteit Brussel (VUB), Bldg G/10, Pleinlaan 2, 1050, Brussels, Belgium
Robert Meersman
School of Computer Science and Information Technology, RMIT University, Bld 10.10, 376-392 Swanston Street, 3001, Melbourne, VIC, Australia
Zahir Tari
Facultad de Informática, Universidad Politécnica de Madrid, Campus de Montegancedo S/N, 28660, Boadilla del Monte, Madrid, Spain
Pilar Herrero

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Roche, M., Kodratoff, Y. (2006). Pruning Terminology Extracted from a Specialized Corpus for CV Ontology Acquisition. In: Meersman, R., Tari, Z., Herrero, P. (eds) On the Move to Meaningful Internet Systems 2006: OTM 2006 Workshops. OTM 2006. Lecture Notes in Computer Science, vol 4278. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11915072_13

Download citation

DOI: https://doi.org/10.1007/11915072_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-48273-4
Online ISBN: 978-3-540-48276-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics