Skip to main content

Using Data Mining for Static Code Analysis of C

  • Conference paper
Advanced Data Mining and Applications (ADMA 2012)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7713))

Included in the following conference series:

Abstract

Static analysis of source code is one way to find bugs and problems in large software projects. Many approaches to static analysis have been proposed. We proposed a novel way of performing static analysis. Instead of methods based on semantic/logic analysis we apply machine learning directly to the problem. This has many benefits. Learning by example means trivial programmer adaptability (a problem with many other approaches), learning systems also has the advantage to be able to generalise and find problematic source code constructs that are not exactly as the programmer initially thought, to name a few. Due to the general interest in code quality and the availability of large open source code bases as test and development data, we believe this problem should be of interest to the larger data mining community. In this work we extend our previous approach and investigate a new way of doing feature selection and test the suitability of many different learning algorithms. This on a selection of problems we adapted from large publicly available open source projects. Many algorithms were much more successful than our previous proof-of-concept, and deliver practical levels of performance. This is clearly an interesting and minable problem.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Axelsson, S., Baca, D., Feldt, R., Sidlauskas, D., Kacan, D.: Detecting defects with an interactive code review tool based on visualisation and machine learning. In: The 21st International Conference on Software Engineering and Knowledge Engineering, SEKE 2009 (2009)

    Google Scholar 

  2. Baca, D.: Automated static code analysis: a tool for early vulnerability detection. Department of Systems and Software Engineering, School of Engineering, Blekinge Institute of Technology, Karlskrona (2009), licentiatavhandling Ronneby: Blekinge tekniska högskola (2009)

    Google Scholar 

  3. Brun, Y., Ernst, M.D.: Finding latent code errors via machine learning over program executions. In: Proceedings of the 26th International Conference on Software Engineering, ICSE 2004, pp. 480–490. IEEE Computer Society, Washington, DC (2004)

    Chapter  Google Scholar 

  4. Challagulla, V.U.B., Bastani, F.B., Yen, I.L., Paul, R.A.: Empirical assessment of machine learning based software defect prediction techniques. In: Proceedings of the 10th IEEE International Workshop on Object-Oriented Real-Time Dependable Systems, WORDS 2005, pp. 263–270. IEEE Computer Society, Washington, DC (2005)

    Chapter  Google Scholar 

  5. Chess, B., West, J.: Secure Programming with Static Analysis, 1st edn. Addison Wesley Professional, Erewhon (2007)

    Google Scholar 

  6. Engler, D., Chelf, B., Chou, A., Hallem, S.: Checking system rules using system-specific, programmer-written compiler extensions. In: Proceedings of the 4th Symposium on Operating System Design and Implementation (OSDI 2000). USENIX, San Diego (2000)

    Google Scholar 

  7. Fenton, N.E., Neil, M.: A critique of software defect prediction models. IEEE Trans. Softw. Eng. 25(5), 675–689 (1999)

    Article  Google Scholar 

  8. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: An update. SIGKDD Explorations 11(1) (2009)

    Google Scholar 

  9. Heckman, S., Williams, L.: A model building process for identifying actionable static analysis alerts. In: Proceedings of the 2009 International Conference on Software Testing Verification and Validation, ICST 2009, pp. 161–170. IEEE Computer Society, Washington, DC (2009)

    Chapter  Google Scholar 

  10. Jiang, M., Munawar, M.A., Reidemeister, T., Ward, P.A.S.: Detection and diagnosis of recurrent faults in software systems by invariant analysis. In: Proceedings of the 2008 11th IEEE High Assurance Systems Engineering Symposium, HASE 2008, pp. 323–332. IEEE Computer Society, Washington, DC (2008)

    Chapter  Google Scholar 

  11. Jiang, Y., Cuki, B., Menzies, T., Bartlow, N.: Comparing design and code metrics for software quality prediction. In: Proceedings of the 4th International Workshop on Predictor Models in Software Engineering, PROMISE 2008, pp. 11–18. ACM, New York (2008)

    Chapter  Google Scholar 

  12. Kreimer, J.: Adaptive detection of design flaws. Electron. Notes Theor. Comput. Sci. 141(4), 117–136 (2005)

    Article  Google Scholar 

  13. Moser, R., Pedrycz, W., Succi, G.: A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction. In: Proceedings of the 30th International Conference on Software Engineering, ICSE 2008, pp. 181–190. ACM, New York (2008)

    Chapter  Google Scholar 

  14. Song, Q., Shepperd, M., Cartwright, M., Mair, C.: Software defect association mining and defect correction effort prediction. IEEE Trans. Softw. Eng. 32(2), 69–82 (2006)

    Article  Google Scholar 

  15. Tribus, H.: Static Code Features for a Machine Learning based Inspection An approach for C. Master’s thesis, School of Engineering, Blekinge Institute of Technology, SE371 79 Karlskrona, Sweden (June 2010), Computer Science Thesis no: MSE-2010-16

    Google Scholar 

  16. Turhan, B., Kutlubay, O.: Mining software data. In: Proceedings of the 2007 IEEE 23rd International Conference on Data Engineering Workshop, ICDEW 2007, pp. 912–916. IEEE Computer Society, Washington, DC (2007)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Tribus, H., Morrigl, I., Axelsson, S. (2012). Using Data Mining for Static Code Analysis of C. In: Zhou, S., Zhang, S., Karypis, G. (eds) Advanced Data Mining and Applications. ADMA 2012. Lecture Notes in Computer Science(), vol 7713. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35527-1_50

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-35527-1_50

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-35526-4

  • Online ISBN: 978-3-642-35527-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics