Using Data Mining for Static Code Analysis of C

Tribus, Hannes; Morrigl, Irene; Axelsson, Stefan

doi:10.1007/978-3-642-35527-1_50

Hannes Tribus²²,
Irene Morrigl²² &
Stefan Axelsson²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7713))

Included in the following conference series:

International Conference on Advanced Data Mining and Applications

3636 Accesses
3 Citations

Abstract

Static analysis of source code is one way to find bugs and problems in large software projects. Many approaches to static analysis have been proposed. We proposed a novel way of performing static analysis. Instead of methods based on semantic/logic analysis we apply machine learning directly to the problem. This has many benefits. Learning by example means trivial programmer adaptability (a problem with many other approaches), learning systems also has the advantage to be able to generalise and find problematic source code constructs that are not exactly as the programmer initially thought, to name a few. Due to the general interest in code quality and the availability of large open source code bases as test and development data, we believe this problem should be of interest to the larger data mining community. In this work we extend our previous approach and investigate a new way of doing feature selection and test the suitability of many different learning algorithms. This on a selection of problems we adapted from large publicly available open source projects. Many algorithms were much more successful than our previous proof-of-concept, and deliver practical levels of performance. This is clearly an interesting and minable problem.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Which Static Code Metrics Can Help to Predict Test Case Effectiveness? New Metrics and Their Empirical Evaluation on Projects Assessed for Industrial Relevance

Chapter 8 Recognizing Lines of Code Violating Company-Specific Coding Guidelines Using Machine Learning

Learning a Strategy for Choosing Widening Thresholds from a Large Codebase

References

Axelsson, S., Baca, D., Feldt, R., Sidlauskas, D., Kacan, D.: Detecting defects with an interactive code review tool based on visualisation and machine learning. In: The 21st International Conference on Software Engineering and Knowledge Engineering, SEKE 2009 (2009)
Google Scholar
Baca, D.: Automated static code analysis: a tool for early vulnerability detection. Department of Systems and Software Engineering, School of Engineering, Blekinge Institute of Technology, Karlskrona (2009), licentiatavhandling Ronneby: Blekinge tekniska högskola (2009)
Google Scholar
Brun, Y., Ernst, M.D.: Finding latent code errors via machine learning over program executions. In: Proceedings of the 26th International Conference on Software Engineering, ICSE 2004, pp. 480–490. IEEE Computer Society, Washington, DC (2004)
Chapter Google Scholar
Challagulla, V.U.B., Bastani, F.B., Yen, I.L., Paul, R.A.: Empirical assessment of machine learning based software defect prediction techniques. In: Proceedings of the 10th IEEE International Workshop on Object-Oriented Real-Time Dependable Systems, WORDS 2005, pp. 263–270. IEEE Computer Society, Washington, DC (2005)
Chapter Google Scholar
Chess, B., West, J.: Secure Programming with Static Analysis, 1st edn. Addison Wesley Professional, Erewhon (2007)
Google Scholar
Engler, D., Chelf, B., Chou, A., Hallem, S.: Checking system rules using system-specific, programmer-written compiler extensions. In: Proceedings of the 4th Symposium on Operating System Design and Implementation (OSDI 2000). USENIX, San Diego (2000)
Google Scholar
Fenton, N.E., Neil, M.: A critique of software defect prediction models. IEEE Trans. Softw. Eng. 25(5), 675–689 (1999)
Article Google Scholar
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: An update. SIGKDD Explorations 11(1) (2009)
Google Scholar
Heckman, S., Williams, L.: A model building process for identifying actionable static analysis alerts. In: Proceedings of the 2009 International Conference on Software Testing Verification and Validation, ICST 2009, pp. 161–170. IEEE Computer Society, Washington, DC (2009)
Chapter Google Scholar
Jiang, M., Munawar, M.A., Reidemeister, T., Ward, P.A.S.: Detection and diagnosis of recurrent faults in software systems by invariant analysis. In: Proceedings of the 2008 11th IEEE High Assurance Systems Engineering Symposium, HASE 2008, pp. 323–332. IEEE Computer Society, Washington, DC (2008)
Chapter Google Scholar
Jiang, Y., Cuki, B., Menzies, T., Bartlow, N.: Comparing design and code metrics for software quality prediction. In: Proceedings of the 4th International Workshop on Predictor Models in Software Engineering, PROMISE 2008, pp. 11–18. ACM, New York (2008)
Chapter Google Scholar
Kreimer, J.: Adaptive detection of design flaws. Electron. Notes Theor. Comput. Sci. 141(4), 117–136 (2005)
Article Google Scholar
Moser, R., Pedrycz, W., Succi, G.: A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction. In: Proceedings of the 30th International Conference on Software Engineering, ICSE 2008, pp. 181–190. ACM, New York (2008)
Chapter Google Scholar
Song, Q., Shepperd, M., Cartwright, M., Mair, C.: Software defect association mining and defect correction effort prediction. IEEE Trans. Softw. Eng. 32(2), 69–82 (2006)
Article Google Scholar
Tribus, H.: Static Code Features for a Machine Learning based Inspection An approach for C. Master’s thesis, School of Engineering, Blekinge Institute of Technology, SE371 79 Karlskrona, Sweden (June 2010), Computer Science Thesis no: MSE-2010-16
Google Scholar
Turhan, B., Kutlubay, O.: Mining software data. In: Proceedings of the 2007 IEEE 23rd International Conference on Data Engineering Workshop, ICDEW 2007, pp. 912–916. IEEE Computer Society, Washington, DC (2007)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Blekinge Institute of Technology, Sweden
Hannes Tribus, Irene Morrigl & Stefan Axelsson

Authors

Hannes Tribus
View author publications
You can also search for this author in PubMed Google Scholar
Irene Morrigl
View author publications
You can also search for this author in PubMed Google Scholar
Stefan Axelsson
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computer Science, Fudan University, Handan Road 220, 200433, Shanghai, China
Shuigeng Zhou
Chinese Academy of Sciences, Academy of Mathematics and Systems Science, Dongguancun East Road 55, 100190, Beijing, China
Songmao Zhang
Department of Computer Science and Engineering, University of Minnesota, Union Street SE 200, 55455, Minneapolis, MN, USA
George Karypis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tribus, H., Morrigl, I., Axelsson, S. (2012). Using Data Mining for Static Code Analysis of C. In: Zhou, S., Zhang, S., Karypis, G. (eds) Advanced Data Mining and Applications. ADMA 2012. Lecture Notes in Computer Science(), vol 7713. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35527-1_50

Download citation

DOI: https://doi.org/10.1007/978-3-642-35527-1_50
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35526-4
Online ISBN: 978-3-642-35527-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics