Abstract
Paraphrases are a key feature in many natural language processing applications, and their extraction and generation are important tasks to tackle. Given two comparable corpora in the same language and the same domain, but displaying two different discourse types (lay and specialized), specific paraphrases can be spotted which provide a dimension along which these discourse types can be contrasted. Detecting such paraphrases in comparable corpora is the goal of the present work. Generally, paraphrases are identified by means of lexical and/or structural patterns. In this chapter, we present two methods to extract paraphrases across lay and specialized French monolingual comparable corpora. The first method uses lexical patterns designed according to intuition and linguistic studies, while the second is empirical, based on n-gram matching. The two methods appear to be complementary: the n-gram method confirms the initial lexical patterns and identifies other patterns. Besides, differences in the direction of application of paraphrase patterns highlight differences between specialized and lay discourse.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
References
Banerjee, S., Pedersen, T.: The design, implementation, and use of the n-gram statistics package. In: Proceedings of the Fourth International Conference on Intelligent Text Processing and Computational Linguistics, pp. 370–381, Mexico City (2003)
Bannard, C., Callison-Burch, C.: Paraphrasing with bilingual parallel corpora. In: Proceedings of the 43rd Annual Meeting on Association for, Computational Linguistics, pp. 597–604 (2005)
Barzilay, R.: Information fusion for multidocument summarization: paraphrasing and generation. PhD thesis, Columbia University (2003)
Barzilay, R., Lee, L.: Learning to paraphrase: an unsupervised approach using multiple-sequence alignment. In: HLT-NAACL, pp. 16–23, Edmonton, Canada (2003)
Barzilay, R., McKeown, K.: Extracting paraphrases from a parallel corpus. In: ACL/EACL, pp. 50–57 (2001)
Chiao, Y.C., Zweigenbaum, P.: Looking for candidate translational equivalents in specialized, comparable corpora. In: Proceedings of the 19th COLING, pp. 1208–1212, Taipei, Taiwan (2002)
Daille, B.: Identification des adjectifs relationnels en corpus. In: TALN 1999, pp. 105–114 (1999)
Deléger, L., Zweigenbaum, P.: Extracting lay paraphrases of specialized expressions from monolingual comparable medical corpora. In: Fung, P., Zweigenbaum, P., Rapp, R. (eds.) Proceedings of the 2nd Workshop on Building and Using Comparable Corpora: From Parallel to Nonparallel Corpora, pp. 2–10 (2009)
Elhadad, N., Sutaria, K.: Mining a lexicon of technical terms and lay equivalents. In: ACL BioNLP Workshop, pp. 49–56, Prague, Czech Republic (2007)
Fang, Z.: Scientific literacy: a systemic functional linguistics perspective. Sci. Edu. 89(2), 335–347 (2005)
Fradin, B.: On the semantics of denominal adjectives. In: Sixth Mediterranean Morphology Meeting, Ithaca, Greece (2008)
Fung, P.: A pattern matching method for finding noun and proper noun translations from noisy parallel corpora. In: Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics, pp. 236–233, Boston, Massachusetts (1995)
Hathout, N., Namer, F., Dal, G.: An experimental constructional database: the MorTAL project. In: Boucher, P. (ed.) Many Morphologies, pp. 178–209. Cascadilla, Somerville (2002)
Ibrahim, A., Katz, B., Lin, J.: Extracting structural paraphrases from aligned monolingual corpora. In: Proceedings of the 2nd International Workshop on Paraphrasing, Association for Computational Linguistics, pp. 57–64, Sapporo, Japan (2003)
Jacquemin, C.: Syntagmatic and paradigmatic representations of term variation. In: Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics, pp. 341–348, College Park, Maryland (1999)
L’Homme, M.: Adjectifs dérivés sémantiques (ADS) dans la structuration des terminologies. In: Terminologie, Ontologie et Représentation des Connaissances, Université Jean-Moulin Lyon-3 (2004)
Lindberg, D.A.B., Humphreys, B.L., McCray, A.T.: The unified medical language system. Methods Inf. Med. 32(2), 81–91 (1993)
Max, A.: Local rephrasing suggestions for supporting the work of writers. In: Proceedings of GoTAL, Gothenburg, Sweden (2008)
Max, A., Wisniewski, G.: Mining naturally-occurring corrections and paraphrases from Wikipedia’s revision history. In: Calzolari, N., Choukri, K., Maegaard, B., Mariani, J., Odijk, J., Piperidis, S., Rosner, M., Tapias, D. (eds.) Proceedings of the 7th International Conference on Language Resources and Evaluation (LREC’10). European Language Resources Association (ELRA), Valletta, Malta (2010)
McEnery, A.M., Xiao, R.Z.: Parallel and comparable corpora: What are they up to? In: Incorporating Corpora: Translation and the Linguist—Translating Europe. Multilingual Matters, Clevedon (2007)
Morin, E., Daille, B., Takeuchi, K., Kageura, K.: Bilingual terminology mining: using brain, not brawn comparable corpora. In: Proceedings of ACL, Prague, Czech Republic (2007)
Namer, F.: Morphologie, Lexique et Traitement Automatique des Langues: l’Analyseur DériF. Lavoisier, Paris (2009)
Pang, B., Knight, K., Marcu, D.: Syntax-based alignment of multiple translations: extracting paraphrases and generating new sentences. In: Proceedings of HLT-NAACL 2003, pp. 102–109, Edmonton, Canada (2003)
Pasca, M., Dienes, P.: Aligning needles in a haystack: paraphrase acquisition across the web. In: Proceedings of IJCNLP, pp. 119–130 (2005)
Rapp, R.: Identifying word translations in non-parallel texts. In: Proceedings of the 33rd Annual Meeting of the Association for, Computational Linguistics, pp. 320–322 (1995)
Shinyama, Y., Sekine, S.: Paraphrase acquisition for information extraction. In: Proceedings of the 2nd International Workshop on Paraphrasing (IWP), pp. 65–71, Sapporo, Japan (2003)
Wolff, S.: Automatic coding of medical vocabulary (Chap. 7). In: Sager, N., Friedman, C., Lyman, M.S. (eds.) Medical Language Processing: Computer Management of Narrative Data, pp. 145–162. Addison-Wesley, New York (1986)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix
Appendix
This appendix contains example patterns: Table 10 provides examples of the patterns presented in Table 4, while Table 11 shows examples of the patterns presented in Table 7.
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Deléger, L., Cartoni, B., Zweigenbaum, P. (2013). Paraphrase Detection in Monolingual Specialized/Lay Comparable Corpora. In: Sharoff, S., Rapp, R., Zweigenbaum, P., Fung, P. (eds) Building and Using Comparable Corpora. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20128-8_12
Download citation
DOI: https://doi.org/10.1007/978-3-642-20128-8_12
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-20127-1
Online ISBN: 978-3-642-20128-8
eBook Packages: Computer ScienceComputer Science (R0)