Abstract
We present a reference model for finding (prima facie) evidence of discrimination in datasets of historical decision records in socially sensitive tasks, including access to credit, mortgage, insurance, labor market and other benefits. We formalize the process of direct and indirect discrimination discovery in a rule-based framework, by modelling protected-by-law groups, such as minorities or disadvantaged segments, and contexts where discrimination occurs. Classification rules, extracted from the historical records, allow for unveiling contexts of unlawful discrimination, where the degree of burden over protected-by-law groups is evaluated by formalizing existing norms and regulations in terms of quantitative measures. The measures are defined as functions of the contingency table of a classification rule, and their statistical significance is assessed, relying on a large body of statistical inference methods for proportions. Key legal concepts and reasonings are then used to drive the analysis on the set of classification rules, with the aim of discovering patterns of discrimination, either direct or indirect. Analyses of affirmative action, favoritism and argumentation against discrimination allegations are also modelled in the proposed framework. Finally, we present an implementation, called LP2DD, of the overall reference model that integrates induction, through data mining classification rule extraction, and deduction, through a computational logic implementation of the analytical tools. The LP2DD system is put at work on the analysis of a dataset of credit decision records.













Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
\({\frac{con\,f( {\bf A, B \rightarrow C} ) } { con\,f( {\bf B \rightarrow C} ) }} = {\frac{supp( {\bf A, B, C} ) supp( {\bf B} )} { supp( {\bf A, B} ) supp( {\bf B, C} ) }} = {\frac{con\,f( {\bf B, C \rightarrow A} ) } { con\,f( {\bf B \rightarrow A} ) }}.\)
We use the name “a-protection” instead of “α-protection” in order not to generate confusion later on when confidence intervals at the significance level of 100(1 − α)% will be introduced.
For a rule X → A, there are 2|X| rules A, B → D obtained by splitting X into D and B.
With reference to Fig. 2, consider a rule c with a 1 = x, n 1 = x + 1, a 2 = 1, n 2 = y, for x, y natural numbers. Fixed \(x = ms |{\mathcal D}|\) to satisfy the minimum support requirement, we have slift(c) = (x y)/(x + 1) ≥ y/2, which is unbound. The reasoning is analogous for the odds lift, which is olift(c) = x(y−1).
References
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: Proceedings of VLDB 1994, Morgan Kaufmann, pp 487–499
Agresti A (2002) Categorical data analysis. Wiley, London
Agresti A, Brian C (2000) Simple and effective confidence intervals for proportions and differences of proportions result from adding two successes and two failures. Am Stat 54(4):280–288
Apt KR (1997) From logic programming to prolog. Prentice Hall, Englewood
Australian Legislation (2010)(a) Equal Opportunity Act—Victoria State, (b) Anti-Discrimination Act—Queensland State. http://www.austlii.edu.au
Baesens B, Gestel TV, Viaene S, Stepanova M, Suykens J, Vanthienen J (2003) Benchmarking state-of-the-art classification algorithms for credit scoring. J Oper Res Soc 54(6):627–635
Becker GS (1957) The economics of discrimination. University of Chicago Press, Chicago
Bell M, Chopin I, Palmer F (2007) Developing anti-discrimination law in Europe. European Network of Legal Experts in Anti-Discrimination, http://www.ec.europa.eu/employment_social/fundamental_rights
Calem PS, Gillen K, Wachter S (2004) The neighborhood distribution of subprime mortgage lending. J Real Estate Finance Econ 29:393–410
Chien CF, Chen L (2008) Data mining to improve personnel selection and enhance human capital: a case study in high-technology industry. Expert Syst Appl 34(1):280–290
Dymski GA (2006) Discrimination in the credit and housing markets: findings and challenges. In: Rodgers WM (ed) Handbook on the economics of discrimination. Edward Elgar Publishing Inc., Northampton, MA, pp 215–259
Ellis E (2005) EU Anti-Discrimination Law. Oxford University Press, Oxford
ENAR (2007) European Network Against Racism, Fact Sheet 33: multiple discrimination. http://www.enar-eu.org
ENAR (2008) European Network Against Racism, Fact Sheet 35: positive actions. http://www.enar-eu.org
European Union Legislation (2010) (a) Racial Equality Directive, (b) Employment Equality Directive. http://www.ec.europa.eu/employment_social/fundamental_rights
Farrington CP, Manning G (1990) Test statistics and sample size formulae for comparative binomial trials with null hypothesis of non-zero risk difference or non-unity relative risk. Stat Med 9:1447–1454
Fleiss JL, Levin B, Paik MC (2003) Statistical methods for rates and proportions. Wiley, London
Gastwirth JL (1984) Statistical methods for analyzing claims of employment discrimination. Ind Labor Relat Rev 38:75–86
Gastwirth JL (1992) Statistical reasoning in the legal setting. Am Stat 46(1):55–69
Geng L, Hamilton HJ (2006) Interestingness measures for data mining: a survey. ACM Comput Surv 38(3), Article 9
Goethals B (2010) Frequent itemset mining implementations repository. http://www.fimi.cs.helsinki.fi
Han J, Cheng H, Xin D, Yan X (2007) Frequent pattern mining: current status and future directions. Data Min Knowl Discov 15(1):55–86
Hand DJ, Henley WE (1997) Statistical classification methods in consumer credit scoring: a review. J Roy Stat Soc Ser A 160:523–541
Harford T (2008) The logic of life. The Random House Publishing Group, New York, NY
Hintoglu AA, Inan A, Saygin Y, Keskinöz M (2005) Suppressing data sets to prevent discovery of association rules. In: Proceedings of IEEE ICDM 2005, IEEE Computer Society, pp 645–648
Holzer HJ, Neumark D (eds) (2004) The economics of affirmative action. Edward Elgar, Cheltenham
Holzer HJ, Neumark D (2006) Affirmative action: what do we know? J Policy Anal Manag 25:463–490
Hunter R (1992) Indirect discrimination in the workplace. The Federation Press, Annandale
Johnston B, Governatori G (2003) Induction of defeasible logic theories in the legal domain. In: Proceedings of ICAIL 2003, ACM, pp 204–213
Kamiran F, Calders T (2009) Classification without discrimination. In: IEEE international conference on computer, control & communication (IEEE-IC4), IEEE press
Kaye D, Aickin M (eds) (1992) Statistical methods in discrimination litigation. Marcel Dekker, Inc., New York
Kim KH (2007) Favoritism and reverse discrimination. Eur Econ Rev 51:101–123
Knopff R (1986) On proving discrimination: statistical methods and unfolding policy logics. Can Public Policy 12:573–583
Kuhn P (1987) Sex discrimination in labor markets: the role of statistical evidence. Am Econ Rev 77:567–583
LaCour-Little M (1999) Discrimination in mortgage lending: a critical review of the literature. J Real Estate Lit 7:15–49
Lerner N (1991) Group rights and discrimination in international law. Martinus Nijhoff Publishers, Dordrecht
Lerner R, Nagai AK (2000) Reverse discrimination by the numbers. J Acad Quest 13:71–84
Leung HM, Kupper LL (1981) Comparisons of confidence intervals for attributable risk. Biometrics 37(2):293–302
Makkonen T (2006) Measuring discrimination: data collection and the EU equality law. European Network of Legal Experts in Anti-Discrimination, http://www.ec.europa.eu/employment_social/fundamental_rights
Makkonen T (2007) European handbook on equality data. European Network of Legal Experts in Anti-Discrimination, http://www.ec.europa.eu/employment_social/fundamental_rights
Newcombe RG (1998) Interval estimation for the difference between independent proportions: comparison of eleven methods. Stat Med 17:873–890
Newman D, Hettich S, Blake C, Merz C (1998) UCI repository of machine learning databases. http://www.archive.ics.uci.edu/ml
Pedreschi D, Ruggieri S, Turini F (2008) Discrimination-aware data mining. In: Proceedings of ACM KDD 2008, ACM, pp 560–568, Extended version to appear in ACM Trans. on Knowledge Discovery from Data
Pedreschi D, Ruggieri S, Turini F (2009) Measuring discrimination in socially-sensitive decision records. In: Proceedings of the SIAM SDM 2009, SIAM, pp 581–592
Piette MJ, White PF (1999) Approaches for dealing with small sample sizes in employment discrimination litigation. J Forensic Econ 12:43–56
Prakken H, Sartor G (2002) The role of logic in computational models of legal argument: a critical survey. In: Kakas AC, Sadri F (eds) Computational logic. Logic programming and beyond, Springer, Lecture notes in Computer Science, vol 2408, pp 342–381
R Development Core Team (2010) R: a language and environment for statistical computing. Version 2.7.2, http://www.R-project.org
Rauch J, Simunek M (2005) An alternative approach to mining association rules. In: Lin TY, Ohsuga S, Liau C-J, Hu X, Tsumoto S (eds) Foundations of data mining and knowledge discovery, studies in computational intelligence, vol 6. Springer, USA, pp 211–231
Rauch J, Simunek M (2010) 4-ft Miner procedure. http://www.lispminer.vse.cz
Reiczigel J, Abonyi-Tóth Z, Singer J (2008) An exact confidence set for two binomial proportions and exact unconditional confidence intervals for the difference and ratio of proportions. Comput Stat Data Anal 52(11):5046–5053
Riach PA, Rich J (2002) Field experiments of discrimination in the market place. Econ J 112:480–518
Rorive I (2009) Proving discrimination cases—the role of situation testing. Centre For Equal Rights & Migration Policy Group http://www.migpolgroup.com/publications.php
Schiek D, Waddington L, Bell M (2007) Cases, materials and text on National, Supranational and International Non-Discrimination Law. IUS Commune Casebooks for the Common Law of Europe
Sowell T (ed) (2005) Affirmative action around the World: an empirical analysis. Yale University Press, New Haven
Squires GD (2003) Racial profiling, insurance style: insurance redlining and the uneven development of metropolitan areas. J Urban Aff 25(4):391–410
Sterling L, Shapiro E (1994) The art of prolog, 2nd edn. The MIT Press, Cambridge
Stranieri A, Zeleznikow J (1999) The evaluation of legal knowledge based systems. In: Proceedings of ICAIL 1999, ACM, pp 18–24
Stranieri A, Zeleznikow J, Gawler M, Lewis B (1999) A hybrid rule—neural approach for the automation of legal reasoning in the discretionary domain of family law in australia. Artif Intell Law 7(2–3):153–183
Sweeney L (2002) Achieving k-anonymity privacy protection using generalization and suppression. Int J Uncertain Fuzziness Knowl Based Syst 10(5):571–588
Tan PN, Steinbach M, Kumar V (2006) Introduction to data mining. Addison-Wesley, Reading
Thomas LC (2000) A survey of credit and behavioural scoring: forecasting financial risk of lending to consumers. Int J Forecast 16:149–172
Tian M, Tang ML, Ng HKT, Chan PS (2008) Confidence intervals for the risk ratio under inverse sampling. Stat Med 27:3301–3324
Tobler C (2008) Limits and potential of the concept of indirect discrimination. European Network of Legal Experts in Anti-Discrimination, http://www.ec.europa.eu/employment_social/fundamental_rights
UK Legislation (2010) (a) Sex Discrimination Act, (b) Race Relation Act. http://www.statutelaw.gov.uk
United Nations Legislation (2010) (a) Convention on the Elimination of All forms of Racial Discrimination, (b) Convention on the Elimination of All forms of Discrimination Against Women. http://www.ohchr.org
US Federal Legislation (2010) (a) Equal Credit Opportunity Act, (b) Fair Housing Act, (c) Intentional Employment Discrimination, (d) Equal Pay Act, (e) Pregnancy Discrimination Act, (f) Civil Right Act. http://www.usdoj.gov
Verykios VS, Elmagarmid AK, Bertino E, Saygin Y, Dasseni E (2004) Association rule hiding. IEEE Trans Knowl Data Eng 16(4):434–447
Wang K, Fung BCM, Yu PS (2005) Template-based privacy preservation in classification problems. In: Proceedings of IEEE ICDM 2005, IEEE Computer Society, pp 466–473
Webb GI (2000) Efficient search for association rules. In: Proceedings of ACM KDD 2000, ACM, pp 99–107
Wielemaker J (2009) SWI-Prolog. University of Amsterdam, Version 5.6, http://www.swi-prolog.org
Williams T, Kelley C (2010) Gnuplot. Version 4.0, http://www.gnuplot.info
Yin X, Han J (2003) CPAR: Classification based on Predictive Association Rules. In: Proceedings of SIAM SDM 2003, SIAM, pp 331–335
Yinger J (1998) Evidence on discrimination in consumer markets. J Econ Perspect 12:23–40
Zeleznikow J, Vossos G, Hunter D (1994) The IKBALS project: multi-modal reasoning in legal knowledge based system. Artif Intell Law 2(3):169–203
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ruggieri, S., Pedreschi, D. & Turini, F. Integrating induction and deduction for finding evidence of discrimination. Artif Intell Law 18, 1–43 (2010). https://doi.org/10.1007/s10506-010-9089-5
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10506-010-9089-5