Abstract
This paper presents an experimental study for extracting a terminology from a corpus made of Curriculum Vitae (CV). This terminology is to be used for ontology acquisition. The choice of the pruning rate of the terminology is crucial relative to the quality of the ontology acquired. In this paper, we investigate this pruning rate by using several evaluation measures (precision, recall, F-measure, and ROC curve).
An erratum to this chapter can be found at http://dx.doi.org/10.1007/11915072_109.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Amrani, A., Kodratoff, Y., Matte-Tailliez, O.: A semi-automatic system for tagging specialized corpora. In: Dai, H., Srikant, R., Zhang, C. (eds.) PAKDD 2004. LNCS (LNAI), vol. 3056, pp. 670–681. Springer, Heidelberg (2004)
Aussenac-Gilles, N., Bourigault, D.: Construction d’ontologies à partir de textes. In: Actes de TALN 2003, vol. 2, pp. 27–47 (2003)
Bourigault, D., Jacquemin, C.: Term extraction + term clustering: An integrated platform for computer-aided terminology. In: Proceedings of EACL 1999, Bergen., pp. 15–22 (1999)
Brill, E.: Some advances in transformation-based part of speech tagging. In: AAAI, vol. 1, pp. 722–727 (1994)
Daille, B.: Study and Implementation of Combined Techniques for Automatic Extraction of Terminology. In: Resnik, P., Klavans, J. (eds.) The Balancing Act: Combining Symbolic and Statistical Approaches to Language, pp. 49–66. MIT Press, Cambridge (1996)
David, S., Plante, P.: De la nécessité d’une approche morpho syntaxique dans l’analyse de textes. In: Intelligence Artificielle et Sciences Cognitives au Quebec, vol. 3, pp. 140–154 (1990)
Dunning, T.E.: Accurate methods for the statistics of surprise and coincidence. Computational Linguistics 19(1), 61–74 (1993)
Evans, D.A., Zhai, C.: Noun-phrase analysis in unrestricted text for information retrieval. In: Proceedings of ACL, Santa Cruz, US, pp. 17–24 (1996)
Fabre, C., Bourigault, D.: Linguistic clues for corpus-based acquisition of lexical dependencies. In: Corpus Linguistics, Lancaster, pp. 176–184 (2001)
Ferri, C., Flach, P., Hernandez-Orallo, J.: Learning decision trees using the area under the ROC curve. In: Proceedings of ICML 2002, pp. 139–146 (2002)
Halliday, M.A.K.: System and Function in Language. Oxford University Press, London (1976)
Jacquemin, C.: Variation terminologique : Reconnaissance et acquisition automatiques de termes et de leurs variantes en corpus. PhD thesis, Mémoire d’Habilitation à Diriger des Recherches en informatique fondamentale, Université de Nantes (1997)
Roche, M.: Intégration de la construction de la terminologie de domaines spécialisés dans un processus global de fouille de textes. PhD thesis, Université de Paris, Décembre 11 (2004)
Roche, M., Azé, J., Kodratoff, Y., Sebag, M.: Learning interestingness measures in terminology extraction. A ROC-based approach. In: Proceedings of ”ROC Analysis in AI” Workshop (ECAI 2004), Valencia, Spain, pp. 81–88 (2004)
Roche, M., Heitz, T., Matte-Tailliez, O., Kodratoff, Y.: Exit: Un système itératif pour l’extraction de la terminologie du domaine à partir de corpus spécialisés. In: Proceedings of JADT 2004, vol. 2, pp. 946–956 (2004)
Shamsfard, M., Barforoush, A.A.: The state of the art in ontology learning: a framework for comparison. The Knowledge Engineering Review 18(4), 293–316 (2003)
Smadja, F.: Retrieving collocations from text: Xtract. Computational Linguistics 19(1), 143–177 (1993)
Thanopoulos, A., Fakotakis, N., Kokkianakis, G.: Comparative Evaluation of Collocation Extraction Metrics. In: Proceedings of LREC 2002, vol. 2, pp. 620–625 (2002)
Van Risbergen, C.J.: Information Retrieval, 2nd edn., London, Butterworths (1979)
Yan, L., Dodier, R.H., Mozer, M., Wolniewicz, R.H.: Optimizing classifier performance via an approximation to the Wilcoxon-Mann-Whitney statistic. In: Proceedings of ICML 2003, pp. 848–855 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Roche, M., Kodratoff, Y. (2006). Pruning Terminology Extracted from a Specialized Corpus for CV Ontology Acquisition. In: Meersman, R., Tari, Z., Herrero, P. (eds) On the Move to Meaningful Internet Systems 2006: OTM 2006 Workshops. OTM 2006. Lecture Notes in Computer Science, vol 4278. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11915072_13
Download citation
DOI: https://doi.org/10.1007/11915072_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-48273-4
Online ISBN: 978-3-540-48276-5
eBook Packages: Computer ScienceComputer Science (R0)