Skip to main content
Log in

An empirical study of the impact of modern code review practices on software quality

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Software code review, i.e., the practice of having other team members critique changes to a software system, is a well-established best practice in both open source and proprietary software domains. Prior work has shown that formal code inspections tend to improve the quality of delivered software. However, the formal code inspection process mandates strict review criteria (e.g., in-person meetings and reviewer checklists) to ensure a base level of review quality, while the modern, lightweight code reviewing process does not. Although recent work explores the modern code review process, little is known about the relationship between modern code review practices and long-term software quality. Hence, in this paper, we study the relationship between post-release defects (a popular proxy for long-term software quality) and: (1) code review coverage, i.e., the proportion of changes that have been code reviewed, (2) code review participation, i.e., the degree of reviewer involvement in the code review process, and (3) code reviewer expertise, i.e., the level of domain-specific expertise of the code reviewers. Through a case study of the Qt, VTK, and ITK projects, we find that code review coverage, participation, and expertise share a significant link with software quality. Hence, our results empirically confirm the intuition that poorly-reviewed code has a negative impact on software quality in large systems using modern reviewing tools.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Notes

  1. https://code.google.com/p/gerrit/

  2. http://qt.digia.com/

  3. http://vtk.org/

  4. http://itk.org/

  5. http://sailhome.cs.queensu.ca/replication/reviewing_quality_ext/

  6. http://www.scitools.com/documents/metricsList.php?#Cyclomatic

  7. http://sailhome.cs.queensu.ca/replication/reviewing_quality_ext/

  8. http://sailhome.cs.queensu.ca/replication/reviewing_quality_ext/

  9. https://www.kernel.org/doc/Documentation/SubmittingPatches

References

  • Bacchelli A, Bird C (2013) Expectations, Outcomes, and Challenges of Modern Code Review. In: Proceedings of the 35th Int’l Conference on Software Engineering (ICSE), pp 712–721

  • Baysal O, Kononenko O, Holmes R, Godfrey MW (2013) The Influence of Non-technical Factors on Code Review. In: Proceedings of the 20th Working Conference on Reverse Engineering (WCRE), pp 122–131

  • Beller M , Bacchelli A, Zaidman A, Juergens E (2014) Modern Code Reviews in Open-Source Projects: Which Problems Do They Fix?. In: Proceedings of the 11th Working Conference on Mining Software Repositories (MSR), pp 202–211

  • Bettenburg N, Hassan A E, Adams B, German DM (2014) Management of community contributions: A case study on the Android and Linux software ecosystems. Empirical Software Engineering, To appear

  • Bird C, Nagappan N, Murphy B, Gall H, Devanbu P (2011) Don’t Touch My Code! Examining the Effects of Ownership on Software Quality. In: Proceedings of the 8th joint meeting of the European Software Engineering Conference and the Symposium on the Foundations of Software Engineering (ESEC/FSE), pp 4–14

  • Chambers J M, Hastie T J (eds) (1992) Statistical Models in S, Wadsworth and Brooks/Cole, chap 4

  • Efron B (1986) How Biased is the Apparent Error Rate of a Prediction Rule. J Am Stat Assoc 81(394):461—470

    Article  MathSciNet  MATH  Google Scholar 

  • Fagan M E (1976) Design and Code Inspections to Reduce Errors in Program Development. IBM Syst J 15(3):182–211

    Article  Google Scholar 

  • Graves T L, Karr A F, Marron J S, Siy H (2000) Predicting Fault Incidence using Software Change History. Trans Softw Eng (TSE) 26(7):653–661

    Article  Google Scholar 

  • Hamasaki K, Kula RG, Yoshida N, Cruz AEC, Fujiwara K, Iida H (2013) Who Does What during a Code Review? Datasets of OSS Peer Review Repositories

  • Harrell FE Jr (2002) Regression Modeling Strategies, 1st edn. Springer

  • Harrell FE Jr (2014) rms: Regression Modeling Strategies. http://biostat.mc.vanderbilt.edu/rms, r package version 4.2-1

  • Harrell FE Jr, Lee KL, Califf RM, Pryor DB, Rosati RA (1984) Regression modelling strategies for improved prognostic prediction. Stat Med 3(2):143–152

    Article  Google Scholar 

  • Harrell FE Jr, Lee KL, Matchar DB, Reichert TA (1985) Regression models for prognostic prediction: advantages, problems, and suggested solutions. Cancer Treatment Reports 69(10):1071–1077

    Google Scholar 

  • Hassan AE (2008) Automated Classification of Change Messages in Open Source Projects. In: Proceedings of the 23rd Int’l Symposium on Applied Computing (SAC), pp 837–841

  • Hassan AE (2009) Predicting Faults Using the Complexity of Code Changes. In: Proceedings of the 31st Int’l Conference on Software Engineering (ICSE), pp 78–88

  • Hastie T, Tibshirani R, Friedman J (2009) Elements of Statistical Learning. 2nd edn. Springer

  • Herraiz I, German DM, Gonzalez-Barahona JM, Robles G (2008) Towards a Simplification of the Bug Report form in Eclipse. In: Proceedings of the 5th Working Conference on Mining Software Repositories (MSR), pp 145–148

  • Jiang Y, Adams B, German DM (2013) Will My Patch Make It? And How Fast?: Case Study on the Linux Kernel. In: Proceedings of the 10th Working Conference on Mining Software Repositories (MSR), pp 101–110

  • Kamei Y, Matsumoto S, Monden A, ichi Matsumoto K, Adams B, Hassan AE (2010) Revisiting Common Bug Prediction Findings Using Effort-Aware Models. In: Proceedings of the 26th Int’l Conference on Software Maintenance (ICSM), pp 1–10

  • Kamei Y, Shihab E, Adams B, Hassan A E, Mockus A, Sinha A, Ubayashi N (2013) A Large-Scale Empirical Study of Just-in-Time Quality Assurance. Trans Softw Eng (TSE) 39(6):757–773

    Article  Google Scholar 

  • Kemerer CF, Paulk MC (2009) The Impact of Design and Code Reviews on Software Quality: An Empirical Study Based on PSP Data. Trans Softw Eng (TSE) 35 (4):534–550

    Article  Google Scholar 

  • Kim S, Whitehead EJ Jr, Zhang Y (2008) Classifying software changes: Clean or buggy. Trans Softw Eng (TSE) 34(2):181–196

    Article  Google Scholar 

  • Koru AG, Zhang D, Emam KE, Liu H (2009) An Investigation into the Functional Form of the Size-Defect Relationship for Software Modules. Trans Softw Eng (TSE) 35(2):293–304

    Article  Google Scholar 

  • Mäntylä M V, Lassenius C (2009) What Types of Defects Are Really Discovered in Code Reviews. Trans Softw Eng (TSE) 35(3):430–448

    Article  Google Scholar 

  • Matsumoto S, Kamei Y, Monden A, ichi Matsumoto K, Nakamura M (2010) An analysis of developer metrics for fault prediction. In: Proceedings of the 6th Int’l Conference on Predictive Models in Software Engineering (PROMISE), pp 18:1–18:9

  • McCabe TJ (1976) A complexity measure, p Proceedings of the 2nd Int’l Conference on Software Engineering (ICSE), p 407

  • McIntosh S, Kamei Y, Adams B, Hassan AE (2014) The Impact of Code Review Coverage and Code Review Participation on Software Quality: A Case Study of the QT, VTK, and ITK Projects. In: Proceedings of the 11th Working Conference on Mining Software Repositories (MSR), pp 192–201

  • Menzies T, Stefano JSD , Chapman M , McGill K (2002) Metrics That Matter. In: Proc of the 27th Annual NASA Goddard/IEEE Software Engineering Workshop, pp 51–57

  • Mockus A, Votta LG (2000) Identifying Reasons for Software Changes Using Historic Databases. In: Proceedings of the 16th Int’l Conference on Software Maintenance (ICSM), pp 120–130

  • Mockus A, Weiss D M (2000) Predicting Risk of Software Changes. Bell Labs Tech J 5(2):169–180

    Article  Google Scholar 

  • Mockus A, Fielding RT, Herbsleb JD (2002) Two Case Studies of Open Source Software Development: Apache and Mozilla. Trans Softw Eng Methodol (TOSEM) 11 (3):309–346

    Article  Google Scholar 

  • Mukadam M, Bird C, Rigby PC (2013) Gerrit Software Code Review Data from Android. In: Proceedings of the 10th Working Conference on Mining Software Repositories (MSR), pp 45–48

  • Nagappan N, Ball T (2005) Use of relative code churn measures to predict system defect density. In: Proceedings of the 27th Int’l Conference on Software Engineering (ICSE), pp 284–292

  • Nagappan N, Ball T (2007) Using Software Dependencies and Churn Metrics to Predict Field Failures: An Empirical Case Study. In: Proceedings of the 1st Int’l Symposium on Empirical Software Engineering and Measurement (ESEM), pp 364–373

  • Nagappan N, Ball T, Zeller A (2006) Mining metrics to predict component failures. In: Proceedings of the 28th Int’l Conference on Software Engineering (ICSE), pp 452–461

  • Porter A, Siy H, Mockus A, Votta L (1998) Understanding the Sources of Variation in Software Inspections. Trans Softw Eng Methodol (TOSEM) 7(1):41–79

    Article  Google Scholar 

  • R Core Team (2013) R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. http://www.R-project.org/

  • Rahman F, Devanbu P (2011) Ownership, Experience and Defects: A Fine-Grained Study of Authorship. In: Proceedings of the 33rd Int’l Conference on Software Engineering (ICSE), pp 491–500

  • Rahman F, Devanbu P (2013) How, and why, process metrics are better. In: Proceedings of the 35th Int’l Conference on Software Engineering (ICSE), pp 432–441

  • Rigby PC, Bird C (2013) Convergent Contemporary Software Peer Review Practices. In: Proceedings of the 9th joint meeting of the European Software Engineering Conference and the Symposium on the Foundations of Software Engineering (ESEC/FSE), pp 202–212

  • Rigby PC, Storey MA (2011) Understanding Broadcast Based Peer Review on Open Source Software Projects. In: Proceedings of the 33rd Int’l Conference on Software Engineering (ICSE), pp 541– 550

  • Rigby PC, German DM, Storey MA (2008) Open Source Software Peer Review Practices: A Case Study of the Apache Server. In: Proceedings of the 30th Int’l Conference on Software Engineering (ICSE), pp 541–550

  • Rigby PC, German DM, Cohen L, Storey MA (2014) Peer Review on Open Source Software Projects: Parameters, Statistical Models, and Theory. To appear

  • Sarle WS (1990) The VARCLUS Procedure. In: SAS/STAT User’s Guide, 4th edn, SAS Institute.Inc

  • Shannon CE (1948) A Mathematical Theory of Communication. The Bell System Technical Journal 27:379–423, 623–656

    Article  MathSciNet  MATH  Google Scholar 

  • Shihab E , Mockus A, Kamei Y, Adams B , Hassan AE (2011) High-Impact Defects: A Study of Breakage and Surprise Defects. In: Proceedings of the 8th joint meeting of the European Software Engineering Conference and the Symposium on the Foundations of Software Engineering (ESEC/FSE), pp 300–310

  • Tanaka T, Sakamoto K, Kusumoto S, ichi Matsumoto K, Kikuno T (1995) Improvement of Software Process by Process Description and Benefit Estimation. In: Proceedings of the 17th Int’l Conference on Software Engineering (ICSE) pp 123–132

Download references

Acknowledgments

The authors would like to acknowledge Frank Harrell Jr. for the insightful discussions and assistance with the configuration and debugging of the rms R package. The authors would also like to thank the anonymous reviewers for their fruitful comments on the earlier version of this work (McIntosh et al. 2014).

This research was partially supported by the Natural Sciences and Engineering Research Council of Canada (NSERC) and JSPS KAKENHI Grant Numbers 24680003 and 25540026.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shane McIntosh.

Additional information

Communicated by: Sung Kim and Martin Pinzger

Appendix: A: Example Scripts

Appendix: A: Example Scripts

In this appendix, we include Figs. 14 and 15, which show how our model construction and analysis steps were implemented.

Fig. 14
figure 14

Example R script showing our model construction approach

Fig. 15
figure 15

Example R script showing our model analysis approach

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

McIntosh, S., Kamei, Y., Adams, B. et al. An empirical study of the impact of modern code review practices on software quality. Empir Software Eng 21, 2146–2189 (2016). https://doi.org/10.1007/s10664-015-9381-9

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10664-015-9381-9

Keywords