Abstract
C4.5 and naive Bayes (NB) are two of the top 10 data mining algorithms thanks to their simplicity, effectiveness, and efficiency. It is well known that NB performs very well on some domains, and poorly on others that involve correlated features. C4.5, on the other hand, typically works better than NB on such domains. To integrate their advantages and avoid their disadvantages, many approaches, such as model insertion and model combination, are proposed. The model insertion approach such as NBTree inserts NB into each leaf of the built decision tree. The model combination approach such as C4.5-NB builds C4.5 and NB on a training dataset independently and then combines their prediction results for an unseen instance. In this paper, we focus on a new view and propose a discriminative model selection approach. For detail, at the training time, C4.5 and NB are built on a training dataset independently, and the most reliable one is recorded for each training instance. At the test time, for each test instance, we firstly find its nearest neighbor and then choose the most reliable model for its nearest neighbor to predict its class label. We simply denote the proposed algorithm as C4.5\(\Vert \)NB. C4.5\(\Vert \)NB retains the interpretability of C4.5 and NB, but significantly outperforms C4.5, NB, NBTree, and C4.5-NB.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Wu, X., Kumar, V., Quinlan, J.R.: Top 10 algorithms in data mining. Knowl. Inf. Syst. 14(1), 1–37 (2008)
Ratanamahatana, C.A., Gunopulos, D.: Feature selection for the naive bayesian classifier using decision trees. Appl. Artif. Intell. 17, 475–487 (2003)
Kohavi, R.: Scaling up the accuracy of naive-bayes classifiers: a decision-tree hybrid. In: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, pp. 202–207. ACM (1996)
Quinlan, J.R.: C4.5: Programs for Machine Learning, 1st edn. Morgan Kaufmann, San Mateo (1993)
Provost, F., Domingos, P.: Tree induction for probability-based ranking. Mach. Learn. 52, 199–215 (2003)
Jiang, L., Li, C.: Scaling up the accuracy of decision-tree classifiers: a naive-Bayes combination. J. Comput. 6(7), 1325–1331 (2011)
Witten, I.H., Frank, E., Hall, M.A., Mining, D.: Practical Machine Learning Tools and Techniques, 3rd edn. Morgan Kaufmann, Burlington (2011)
Frank, A., Asuncion, A.: UCI machine learning repository. Department of Information and Computer Science, University of California, Irvine (2010)
Alcalá-Fdez, J., Fernandez, A., Luengo, J., Derrac, J., García, S., Sánchez, L., Herrera, F.: Keel data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J. Multiple-Valued Logic Soft Comput. 17(2–3), 255–287 (2011)
Demsar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)
Acknowledgments
This work was partially supported by the National Natural Science Foundation of China (61203287), the Program for New Century Excellent Talents in University (NCET-12-0953), and the Chenguang Program of Science and Technology of Wuhan (2015070404010202).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Zhang, L., Jiang, L., Li, C. (2016). C4.5 or Naive Bayes: A Discriminative Model Selection Approach. In: Villa, A., Masulli, P., Pons Rivero, A. (eds) Artificial Neural Networks and Machine Learning – ICANN 2016. ICANN 2016. Lecture Notes in Computer Science(), vol 9886. Springer, Cham. https://doi.org/10.1007/978-3-319-44778-0_49
Download citation
DOI: https://doi.org/10.1007/978-3-319-44778-0_49
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-44777-3
Online ISBN: 978-3-319-44778-0
eBook Packages: Computer ScienceComputer Science (R0)