Abstract
In conceiving of autonomous agents able to employ adaptive cooperative behaviours we identify the need to effectively assess the equivalence of agent behavior under conditions of external change. Reinforcement learning algorithms rely on input from the environment as the sole means of informing and so reifying internal state. This paper investigates the assumption that isomorphic representations of environment will lead to equivalent behaviour. To test this equivalence-of assumption we analyse the variance between behavioural profiles in a set of agents using fourteen foundational reinforcement-learning algorithms across four isomorphic representations of the classical Prisoner’s Dilemma gameform. A behavioural profile exists as the aggregated episode-mean distributions of the game outcomes CC, CD, DC, and DD generated from the symmetric selfplay repeated stage game across a two-axis sweep of input parameters: the principal learning rate, \(\alpha \), and the discount factor \(\gamma \), which provides 100 observations of the frequency of the four game outcomes, per algorithm, per gameform representation. A measure of equivalence is indicated by a low variance displayed between any two behavioural profiles generated by any one single algorithm. Despite the representations being theoretically equivalent analysis reveals significant variance in the behavioural profiles of the tested algorithms at both aggregate and individual outcome scales. Given this result, we infer that the isomorphic representations tested in this study are not necessarily equivalent with respect to the induced reachable space made available to any particular algorithm, which in turn can lead to unexpected agent behaviour. Therefore, we conclude that structure-preserving operations applied to environmental reward signals may introduce a vector for algorithmic bias.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Code Availability
A repository of code used in this study, and further supplementary material, is available at https://github.com/simoncstanton/equivalence_study.
References
Angwin, J., Larson, J., Mattu, S., Kirchner, L.: Machine bias. ProPublica (2016). https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing. Accessed 05 Jul 2021
Arthur, B.F.: Complexity economics: why does economics need this different approach? In: Complexity Economics: Proceedings of the Santa Fe Institute’s 2019 Fall Symposium. Santa Fe Institute 2019 Fall Symposium, Santa Fe Institute (2019)
Ashlock, D., Kim, E.-Y.: Fingerprinting: visualization and automatic analysis of prisoner’s dilemma strategies. IEEE Trans. Evol. Comput. 12(5), 647–659 (2008). https://doi.org/10.1109/TEVC.2008.920675
Ashlock, D., Kim, E.-Y., Ashlock, W.: A fingerprint comparison of different Prisoner’s Dilemma payoff matrices. In: Proceedings of the 2010 IEEE Conference on Computational Intelligence and Games, pp. 219–226 (2010). https://doi.org/10.1109/ITW.2010.5593352
Barto, A.G., Anandan, P.: Pattern-Recognizing Stochastic Learning Automata (1985). https://doi.org/10.1109/tsmc.1985.6313371
Barto, A.G., Sutton, R.S., Anderson, C.W.: Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Trans. Syst. Man Cybern. SMC-13(5), 834–846 (1983). https://doi.org/10.1109/TSMC.1983.6313077
Billard, E.A.: Adaptation in a stochastic prisoner’s dilemma with delayed information. Biosystems 37(3), 211–227 (1996). https://doi.org/10.1016/0303-2647(95)01560-4
Brams, S.J.: Theory of Moves. Cambridge University Press (1994)
Bryson, J.J.: Patiency is not a virtue: the design of intelligent systems and systems of ethics. Ethics Inf. Technol. 20(1), 15–26 (2018). https://doi.org/10.1007/s10676-018-9448-6
Crandall, J.W., et al.: Cooperating with machines. Nat. Commun. 9(1), 233 (2018). https://doi.org/10.1038/s41467-017-02597-8
Crandall, J.W., et al.: Supplementary material—cooperating with machines. Nat. Commun. 9(1) (2018b). https://static-content.springer.com/esm/art%3A10.1038%2Fs41467-017-02597-8/MediaObjects/41467_2017_2597_MOESM1_ESM.pdf. Accessed 07 Jan 2020
Grim, P.: Spatialization and greater generosity in the stochastic Prisoner’s Dilemma. Biosystems 37(1), 3–17 (1996). https://doi.org/10.1016/0303-2647(95)01541-8
Herzing, D.L.: Profiling nonhuman intelligence: An exercise in developing unbiased tools for describing other “types” of intelligence on earth. Acta Astronaut. 94(2), 676–680 (2014). https://doi.org/10.1016/j.actaastro.2013.08.007
Hooker, S.: Moving beyond “algorithmic bias is a data problem”. Patterns 2(4) (2021). https://doi.org/10.1016/j.patter.2021.100241
Hooker, S., Moorosi, N., Clark, G., Bengio, S., Denton, E.: Characterising bias in compressed models (2020). https://arxiv.org/abs/2010.03058v2 Accessed 28 Jun 2021
Leibo, J.Z., Zambaldi, V., Lanctot, M., Marecki, J., Graepel, T.: Multi-agent reinforcement learning in sequential social dilemmas. In: Proceedings of the 16th International Conference on Autonomous Agents and Multiagent Systems, pp. 464–473 (2017)
Macy, M.W., Flache, A.: Learning dynamics in social dilemmas. Proc. Natl. Acad. Sci. 99(suppl 3), 7229–7236 (2002). https://doi.org/10.1073/pnas.092080099
Patton, D.U., Brunton, D.-W., Dixon, A., Miller, R.J., Leonard, P., Hackman, R.: Stop and frisk online: theorizing everyday racism in digital policing in the use of social media for identification of criminal conduct and associations. Soc. Media + Soc. 3(3) (2017). https://doi.org/10.1177/2056305117733344
Rahwan, I., et al.: Machine behaviour. Nature 568(7753), 477 (2019). https://doi.org/10.1038/s41586-019-1138-y
Rapoport, A., Guyer, M., Gordon, D.G.: The 2 × 2 Game. University of Michigan Press, Ann Arbor (1976)
Robinson, D., Goforth, D.: The topology of the 2×2 games: a new periodic table. Routledge (2005). https://doi.org/10.4324/9780203340271
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction (1st ed.). The MIT Press, Cambridge (1998)
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction (2nd edn.). The MIT Press, Cambridge (2018)
Szepesvári, C.: Algorithms for Reinforcement Learning. Morgan and Claypool Publishers, San Rafael (2010)
Tan, M.: Multi-agent reinforcement learning: independent vs. cooperative agents. In: Proceedings of the Tenth International Conference on Machine Learning, pp. 330–337 (1993)
Tversky, A., Kahneman, D.: Judgment under uncertainty: heuristics and biases. Science 185(4157), 1124–1131 (1974). https://doi.org/10.1126/science.185.4157.1124
Waller, R.R., Waller, R.: The machine mind: beyond transparent biases. In: Paper presented at Kinds of Intelligence Workshop Series: Cognitive Science Beyond the Human, Leverhulme Centre for the Future of Intelligence. http://lcfi.ac.uk/projects/kinds-of-intelligence/. 25 June 2021
Winfield, A.F., Michael, K., Pitt, J., Evers, V.: Machine ethics: the design and governance of ethical ai and autonomous systems. Proc. IEEE 107(3), 509–517 (2019). https://doi.org/10.1109/JPROC.2019.2900622
Acknowledgements
We would like to acknowledge the use of the high-performance computing facilities provided by the Tasmanian Partnership for Advanced Computing (TPAC) funded and hosted by the University of Tasmania. This research is supported by an Australian Government Research Training Program (RTP) Scholarship.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Stanton, S.C., Dermoudy, J., Ollington, R. (2022). Representation-Induced Algorithmic Bias. In: Long, G., Yu, X., Wang, S. (eds) AI 2021: Advances in Artificial Intelligence. AI 2022. Lecture Notes in Computer Science(), vol 13151. Springer, Cham. https://doi.org/10.1007/978-3-030-97546-3_9
Download citation
DOI: https://doi.org/10.1007/978-3-030-97546-3_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-97545-6
Online ISBN: 978-3-030-97546-3
eBook Packages: Computer ScienceComputer Science (R0)