C4.5 or Naive Bayes: A Discriminative Model Selection Approach

Zhang, Lungan; Jiang, Liangxiao; Li, Chaoqun

doi:10.1007/978-3-319-44778-0_49

Lungan Zhang¹⁶,
Liangxiao Jiang¹⁶ &
Chaoqun Li¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9886))

Included in the following conference series:

International Conference on Artificial Neural Networks

2815 Accesses
2 Citations

Abstract

C4.5 and naive Bayes (NB) are two of the top 10 data mining algorithms thanks to their simplicity, effectiveness, and efficiency. It is well known that NB performs very well on some domains, and poorly on others that involve correlated features. C4.5, on the other hand, typically works better than NB on such domains. To integrate their advantages and avoid their disadvantages, many approaches, such as model insertion and model combination, are proposed. The model insertion approach such as NBTree inserts NB into each leaf of the built decision tree. The model combination approach such as C4.5-NB builds C4.5 and NB on a training dataset independently and then combines their prediction results for an unseen instance. In this paper, we focus on a new view and propose a discriminative model selection approach. For detail, at the training time, C4.5 and NB are built on a training dataset independently, and the most reliable one is recorded for each training instance. At the test time, for each test instance, we firstly find its nearest neighbor and then choose the most reliable model for its nearest neighbor to predict its class label. We simply denote the proposed algorithm as C4.5$\Vert $NB. C4.5$\Vert $NB retains the interpretability of C4.5 and NB, but significantly outperforms C4.5, NB, NBTree, and C4.5-NB.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A discriminative model selection approach and its application to text classification

Article 15 July 2017

SSL-C4.5: Implementation of a Classification Algorithm for Semi-supervised Learning Based on C4.5

SAMME.C2 algorithm for imbalanced multi-class classification

Article 24 July 2024

References

Wu, X., Kumar, V., Quinlan, J.R.: Top 10 algorithms in data mining. Knowl. Inf. Syst. 14(1), 1–37 (2008)
Article Google Scholar
Ratanamahatana, C.A., Gunopulos, D.: Feature selection for the naive bayesian classifier using decision trees. Appl. Artif. Intell. 17, 475–487 (2003)
Article Google Scholar
Kohavi, R.: Scaling up the accuracy of naive-bayes classifiers: a decision-tree hybrid. In: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, pp. 202–207. ACM (1996)
Google Scholar
Quinlan, J.R.: C4.5: Programs for Machine Learning, 1st edn. Morgan Kaufmann, San Mateo (1993)
Google Scholar
Provost, F., Domingos, P.: Tree induction for probability-based ranking. Mach. Learn. 52, 199–215 (2003)
Article MATH Google Scholar
Jiang, L., Li, C.: Scaling up the accuracy of decision-tree classifiers: a naive-Bayes combination. J. Comput. 6(7), 1325–1331 (2011)
Google Scholar
Witten, I.H., Frank, E., Hall, M.A., Mining, D.: Practical Machine Learning Tools and Techniques, 3rd edn. Morgan Kaufmann, Burlington (2011)
Google Scholar
Frank, A., Asuncion, A.: UCI machine learning repository. Department of Information and Computer Science, University of California, Irvine (2010)
Google Scholar
Alcalá-Fdez, J., Fernandez, A., Luengo, J., Derrac, J., García, S., Sánchez, L., Herrera, F.: Keel data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J. Multiple-Valued Logic Soft Comput. 17(2–3), 255–287 (2011)
Google Scholar
Demsar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)
MathSciNet MATH Google Scholar

Download references

Acknowledgments

This work was partially supported by the National Natural Science Foundation of China (61203287), the Program for New Century Excellent Talents in University (NCET-12-0953), and the Chenguang Program of Science and Technology of Wuhan (2015070404010202).

Author information

Authors and Affiliations

Department of Computer Science, China University of Geosciences, Wuhan, 430074, Hubei, China
Lungan Zhang & Liangxiao Jiang
Department of Mathematics, China University of Geosciences, Wuhan, 430074, Hubei, China
Chaoqun Li

Authors

Lungan Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Liangxiao Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Chaoqun Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Liangxiao Jiang .

Editor information

Editors and Affiliations

University of Lausanne, Lausanne, Switzerland
Alessandro E.P. Villa
University of Lausanne, Lausanne, Switzerland
Paolo Masulli
Universitat Politécnica de Catalunya, Terrrassa, Spain
Antonio Javier Pons Rivero

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, L., Jiang, L., Li, C. (2016). C4.5 or Naive Bayes: A Discriminative Model Selection Approach. In: Villa, A., Masulli, P., Pons Rivero, A. (eds) Artificial Neural Networks and Machine Learning – ICANN 2016. ICANN 2016. Lecture Notes in Computer Science(), vol 9886. Springer, Cham. https://doi.org/10.1007/978-3-319-44778-0_49

Download citation

DOI: https://doi.org/10.1007/978-3-319-44778-0_49
Published: 13 August 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-44777-3
Online ISBN: 978-3-319-44778-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics