Unsupervised and supervised text similarity systems for automated identification of national implementing measures of European directives

Nanda, Rohan; Siragusa, Giovanni; Di Caro, Luigi; Boella, Guido; Grossio, Lorenzo; Gerbaudo, Marco; Costamagna, Francesco

doi:10.1007/s10506-018-9236-y

Unsupervised and supervised text similarity systems for automated identification of national implementing measures of European directives

Published: 26 October 2018

Volume 27, pages 199–225, (2019)
Cite this article

Artificial Intelligence and Law Aims and scope Submit manuscript

Rohan Nanda ORCID: orcid.org/0000-0001-7124-4799¹,
Giovanni Siragusa¹,
Luigi Di Caro¹,
Guido Boella¹,
Lorenzo Grossio²,
Marco Gerbaudo² &
…
Francesco Costamagna²

1804 Accesses
2 Altmetric
Explore all metrics

Abstract

The automated identification of national implementations (NIMs) of European directives by text similarity techniques has shown promising preliminary results. Previous works have proposed and utilized unsupervised lexical and semantic similarity techniques based on vector space models, latent semantic analysis and topic models. However, these techniques were evaluated on a small multilingual corpus of directives and NIMs. In this paper, we utilize word and paragraph embedding models learned by shallow neural networks from a multilingual legal corpus of European directives and national legislation (from Ireland, Luxembourg and Italy) to develop unsupervised semantic similarity systems to identify transpositions. We evaluate these models and compare their results with the previous unsupervised methods on a multilingual test corpus of 43 Directives and their corresponding NIMs. We also develop supervised machine learning models to identify transpositions and compare their performance with different feature sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SimiT: A Text Similarity Method Using Lexicon and Dependency Representations

Article 17 June 2020

Legal Text Analysis Using Pre-trained Transformers

Advanced Similarity Measures Using Word Embeddings and Siamese Networks in CBR

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

http://www.europarl.europa.eu/sides/getAllAnswers.do?reference=E-2010-9931&language=SL.
http://eurovoc.europa.eu.
https://spacy.io/.
The output vector is computed by multiplying the embedding vector by the hidden layer.

References

Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M et al (2016) Tensorflow: a system for large-scale machine learning. In: OSDI, vol 16, pp 265–283
Ajani G, Boella G, Di Caro L, Robaldo L, Humphreys L, Praduroux S, Rossi P, Violato A (2017) The European legal taxonomy syllabus: a multi-lingual, multi-level ontology framework to untangle the web of European legal terminology. Appl Ontol 2(4):325–375
Article Google Scholar
Aletras N, Tsarapatsanis D, Preoţiuc-Pietro D, Lampos V (2016) Predicting judicial decisions of the European court of human rights: a natural language processing perspective. PeerJ Comput Sci 2:e93
Article Google Scholar
Bergamaschi S, Po L (2014) Comparing lda and lsa topic models for content-based movie recommendation systems. In: International conference on web information systems and technologies. Springer, pp 247–263
Bird S, Loper E (2004) Nltk: the natural language toolkit. In: Proceedings of the ACL 2004 on interactive poster and demonstration sessions. Association for Computational Linguistics, p 31
Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3(Jan):993–1022
MATH Google Scholar
Boella G, Di Caro L, Humphreys L, Robaldo L, van der Torre L (2012) Nlp challenges for eunomos, a tool to build and manage legal knowledge. In: Language resources and evaluation (LREC). pp 3672–3678
Boella G, Di Caro L, Robaldo L (2013) Semantic relation extraction from legislative text using generalized syntactic dependencies and support vector machines. Springer, Berlin, pp 218–225
Google Scholar
Boella G, Di Caro L, Humphreys L, Robaldo L, Rossi R, van der Torre L (2016) Eunomos, a legal document and knowledge management system for the web to provide relevant, reliable and up-to-date information on the law. Artif Intell Law 24:245–283
Article Google Scholar
Bojanowski P, Grave E, Joulin A, Mikolov T (2016) Enriching word vectors with subword information. arXiv preprint arXiv:1607.04606
Cardellino C, Teruel M, Alemany LA, Villata S (2017) A low-cost, high-coverage legal named entity recognizer, classifier and linker. In: Proceedings of the 16th edition of the international conference on artificial intelligence and law. ACM, pp 9–18
Ciavarini Azzi G (2000) The slow march of european legislation: the implementation of directives. In: European integration after Amsterdam: institutional dynamics and prospects for democracy
Cosma G, Joy M (2012) An approach to source-code plagiarism detection and investigation using latent semantic analysis. IEEE Trans Comput 61(3):379–394
Article MathSciNet MATH Google Scholar
Deerwester S, Dumais ST, Furnas GW, Landauer TK, Harshman R (1990) Indexing by latent semantic analysis. J Am Soc Inf Sci 41(6):391
Article Google Scholar
Eliantonio M, Ballesteros M, Rostane M, Petrovic D (2013) Tools for ensuring implementation and application of eu law and evaluation of their effectiveness. Technical reports on European Parliament
Golub GH, Reinsch C (1970) Singular value decomposition and least squares solutions. Numer Math 14(5):403–420
Article MathSciNet MATH Google Scholar
Hartung J, Knapp G, Sinha B (2011) Statistical meta-analysis with applications, vol 738. Wiley, Hoboken
MATH Google Scholar
Hong L, Davison BD (2010) Empirical study of topic modeling in twitter. In: Proceedings of the first workshop on social media analytics. ACM, pp 80–88
Humphreys L, Santos C, Di Caro L, Boella G, Van Der Torre L, Robaldo L (2015) Mapping recitals to normative provisions in eu legislation to assist legal interpretation. In: JURIX. pp 41–49
Joachims T (1998) Text categorization with support vector machines: learning with many relevant features. In: European conference on machine learning. Springer, pp 137–142
Kenter T, De Rijke M (2015) Short text similarity with word embeddings. In: Proceedings of the 24th ACM international on conference on information and knowledge management. ACM, pp 1411–1420
Le Q, Mikolov T (2014) Distributed representations of sentences and documents. In: International conference on machine learning. pp 1188–1196
Maaten LVD, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9(Nov):2579–2605
MATH Google Scholar
Magerman T, Van Looy B, Song X (2010) Exploring the feasibility and accuracy of latent semantic analysis based text mining techniques to detect similarity between patent documents and scientific publications. Scientometrics 82(2):289–306
Article Google Scholar
Mandal A, Chaki R, Saha S, Ghosh K, Pal A, Ghosh S (2017) Measuring similarity among legal court case documents. In: Proceedings of the 10th annual ACM India compute conference, Compute ’17. ACM, New York, pp 1–9
McHugh ML (2012) Interrater reliability: the kappa statistic. Biochem Med 22(3):276–282
Article MathSciNet Google Scholar
Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781
Nanda R, Di Caro L, Boella G (2016) A text similarity approach for automated transposition detection of European union directives. In: 29th International conference on legal knowledge and information systems, JURIX 2016, vol 294. IOS Press, pp 143–148
Nanda R, Di Caro L, Boella G, Konstantinov H, Tyankov T, Traykov D, Hristov H, Costamagna F, Humphreys L, Robaldo L, et al (2017) A unifying similarity measure for automated identification of national implementations of European union directives. In: Proceedings of the 16th edition of the international conference on articial intelligence and law. ACM, pp 149–158
Nanda R, Siragusa G, Caro LD, Theobald M, Boella G, Robaldo L, Costamagna F (2017) Concept recognition in European and national law. In: Legal knowledge and information systems—JURIX 2017: the thirtieth annual conference, Luxembourg, 13–15 December 2017, pp 193–198
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825–2830
MathSciNet MATH Google Scholar
Řehůřek R, Sojka P (2010) Software framework for topic modelling with Large Corpora. In: Proceedings of the LREC 2010 workshop on new challenges for NLP frameworks, ELRA, Valletta, Malta, pp 45–50. http://is.muni.cz/publication/884893/en
Robaldo L (2010) Interpretation and inference with maximal referential terms. J Comput Syst Sci 76(5):373–388
Article MathSciNet MATH Google Scholar
Robaldo L (2011) Distributivity, collectivity, and cumulativity in terms of (in)dependence and maximality. J Log Lang Inf 20(2):233–271
Article MathSciNet MATH Google Scholar
Robaldo L, Sun X (2017) Reified input/output logic: combining input/output logic and reification to represent norms coming from existing legislation. J Log Comput 27:2471–2503
Article MathSciNet MATH Google Scholar
Robaldo L, Caselli T, Russo I, Grella M (2011) From Italian text to timeml document via dependency parsing. In: Computational linguistics and intelligent text processing—12th international conference, CICLing 2011, Tokyo, Japan, 2011, pp 177–187
Sparck Jones K (1972) A statistical interpretation of term specificity and its application in retrieval. J Doc 28(1):11–21
Article Google Scholar

Download references

Acknowledgements

Research presented in this paper is conducted as a Ph.D. research at the University of Turin, within the Erasmus Mundus Joint International Doctoral (Ph.D.) programme in Law, Science and Technology. This work has been partially supported by the European Union’s Horizon 2020 research and innovation programme under the Marie Skodowska-Curie Grant agreement no. 690974 for the project “MIREL: MIning and REasoning with Legal texts”.

Author information

Authors and Affiliations

Department of Computer Science, University of Turin, Corso Svizzera, 185, 10149, Turin, Italy
Rohan Nanda, Giovanni Siragusa, Luigi Di Caro & Guido Boella
Department of Law, University of Turin, Lungo Dora Siena 100/A, 10153, Turin, Italy
Lorenzo Grossio, Marco Gerbaudo & Francesco Costamagna

Authors

Rohan Nanda
View author publications
You can also search for this author in PubMed Google Scholar
Giovanni Siragusa
View author publications
You can also search for this author in PubMed Google Scholar
Luigi Di Caro
View author publications
You can also search for this author in PubMed Google Scholar
Guido Boella
View author publications
You can also search for this author in PubMed Google Scholar
Lorenzo Grossio
View author publications
You can also search for this author in PubMed Google Scholar
Marco Gerbaudo
View author publications
You can also search for this author in PubMed Google Scholar
Francesco Costamagna
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rohan Nanda.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Nanda, R., Siragusa, G., Di Caro, L. et al. Unsupervised and supervised text similarity systems for automated identification of national implementing measures of European directives. Artif Intell Law 27, 199–225 (2019). https://doi.org/10.1007/s10506-018-9236-y

Download citation

Published: 26 October 2018
Issue Date: 15 June 2019
DOI: https://doi.org/10.1007/s10506-018-9236-y

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Unsupervised and supervised text similarity systems for automated identification of national implementing measures of European directives

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

SimiT: A Text Similarity Method Using Lexicon and Dependency Representations

Legal Text Analysis Using Pre-trained Transformers

Advanced Similarity Measures Using Word Embeddings and Siamese Networks in CBR

Explore related subjects

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now