Skip to main content

Pruning Terminology Extracted from a Specialized Corpus for CV Ontology Acquisition

  • Conference paper
On the Move to Meaningful Internet Systems 2006: OTM 2006 Workshops (OTM 2006)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4278))

  • 1034 Accesses

Abstract

This paper presents an experimental study for extracting a terminology from a corpus made of Curriculum Vitae (CV). This terminology is to be used for ontology acquisition. The choice of the pruning rate of the terminology is crucial relative to the quality of the ontology acquired. In this paper, we investigate this pruning rate by using several evaluation measures (precision, recall, F-measure, and ROC curve).

An erratum to this chapter can be found at http://dx.doi.org/10.1007/11915072_109.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Amrani, A., Kodratoff, Y., Matte-Tailliez, O.: A semi-automatic system for tagging specialized corpora. In: Dai, H., Srikant, R., Zhang, C. (eds.) PAKDD 2004. LNCS (LNAI), vol. 3056, pp. 670–681. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  2. Aussenac-Gilles, N., Bourigault, D.: Construction d’ontologies à partir de textes. In: Actes de TALN 2003, vol. 2, pp. 27–47 (2003)

    Google Scholar 

  3. Bourigault, D., Jacquemin, C.: Term extraction + term clustering: An integrated platform for computer-aided terminology. In: Proceedings of EACL 1999, Bergen., pp. 15–22 (1999)

    Google Scholar 

  4. Brill, E.: Some advances in transformation-based part of speech tagging. In: AAAI, vol. 1, pp. 722–727 (1994)

    Google Scholar 

  5. Daille, B.: Study and Implementation of Combined Techniques for Automatic Extraction of Terminology. In: Resnik, P., Klavans, J. (eds.) The Balancing Act: Combining Symbolic and Statistical Approaches to Language, pp. 49–66. MIT Press, Cambridge (1996)

    Google Scholar 

  6. David, S., Plante, P.: De la nécessité d’une approche morpho syntaxique dans l’analyse de textes. In: Intelligence Artificielle et Sciences Cognitives au Quebec, vol. 3, pp. 140–154 (1990)

    Google Scholar 

  7. Dunning, T.E.: Accurate methods for the statistics of surprise and coincidence. Computational Linguistics 19(1), 61–74 (1993)

    Google Scholar 

  8. Evans, D.A., Zhai, C.: Noun-phrase analysis in unrestricted text for information retrieval. In: Proceedings of ACL, Santa Cruz, US, pp. 17–24 (1996)

    Google Scholar 

  9. Fabre, C., Bourigault, D.: Linguistic clues for corpus-based acquisition of lexical dependencies. In: Corpus Linguistics, Lancaster, pp. 176–184 (2001)

    Google Scholar 

  10. Ferri, C., Flach, P., Hernandez-Orallo, J.: Learning decision trees using the area under the ROC curve. In: Proceedings of ICML 2002, pp. 139–146 (2002)

    Google Scholar 

  11. Halliday, M.A.K.: System and Function in Language. Oxford University Press, London (1976)

    Google Scholar 

  12. Jacquemin, C.: Variation terminologique : Reconnaissance et acquisition automatiques de termes et de leurs variantes en corpus. PhD thesis, Mémoire d’Habilitation à Diriger des Recherches en informatique fondamentale, Université de Nantes (1997)

    Google Scholar 

  13. Roche, M.: Intégration de la construction de la terminologie de domaines spécialisés dans un processus global de fouille de textes. PhD thesis, Université de Paris, Décembre 11 (2004)

    Google Scholar 

  14. Roche, M., Azé, J., Kodratoff, Y., Sebag, M.: Learning interestingness measures in terminology extraction. A ROC-based approach. In: Proceedings of ”ROC Analysis in AI” Workshop (ECAI 2004), Valencia, Spain, pp. 81–88 (2004)

    Google Scholar 

  15. Roche, M., Heitz, T., Matte-Tailliez, O., Kodratoff, Y.: Exit: Un système itératif pour l’extraction de la terminologie du domaine à partir de corpus spécialisés. In: Proceedings of JADT 2004, vol. 2, pp. 946–956 (2004)

    Google Scholar 

  16. Shamsfard, M., Barforoush, A.A.: The state of the art in ontology learning: a framework for comparison. The Knowledge Engineering Review 18(4), 293–316 (2003)

    Article  Google Scholar 

  17. Smadja, F.: Retrieving collocations from text: Xtract. Computational Linguistics 19(1), 143–177 (1993)

    Google Scholar 

  18. Thanopoulos, A., Fakotakis, N., Kokkianakis, G.: Comparative Evaluation of Collocation Extraction Metrics. In: Proceedings of LREC 2002, vol. 2, pp. 620–625 (2002)

    Google Scholar 

  19. Van Risbergen, C.J.: Information Retrieval, 2nd edn., London, Butterworths (1979)

    Google Scholar 

  20. Yan, L., Dodier, R.H., Mozer, M., Wolniewicz, R.H.: Optimizing classifier performance via an approximation to the Wilcoxon-Mann-Whitney statistic. In: Proceedings of ICML 2003, pp. 848–855 (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Roche, M., Kodratoff, Y. (2006). Pruning Terminology Extracted from a Specialized Corpus for CV Ontology Acquisition. In: Meersman, R., Tari, Z., Herrero, P. (eds) On the Move to Meaningful Internet Systems 2006: OTM 2006 Workshops. OTM 2006. Lecture Notes in Computer Science, vol 4278. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11915072_13

Download citation

  • DOI: https://doi.org/10.1007/11915072_13

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-48273-4

  • Online ISBN: 978-3-540-48276-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics