Abstract
In many real-world data mining applications, accurate class probability estimations are often required to make optimal decisions. For example, in direct marketing, we often need to deploy different promotion strategies to customers with different likelihood (probability) of buying some products. When our learning task is to build a model with accurate class probability estimations, C4.4 is the most popular one for achieving this task because of its efficiency and effect. In this paper, we present a locally weighted version of C4.4 to scale up its class probability estimation performance by combining locally weighted learning with C4.4. We call our improved algorithm locally weighted C4.4, simply LWC4.4. We experimentally tested LWC4.4 using the whole 36 UCI data sets selected by Weka, and compared it to other related algorithms: C4.4, NB, KNN, NBTree, and LWNB. The experimental results show that LWC4.4 significantly outperforms all the other algorithms in term of conditional log likelihood, simply CLL. Thus, our work provides an effective algorithm to produce accurate class probability estimation.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Grossman, D., Domingos, P.: Learning Bayesian Network Classifiers by Maximizing Conditional Likelihood. In: Proceedings of the Twenty-First International Conference on Machine Learning, Banff, Canada, pp. 361–368. ACM Press, New York (2004)
Guo, Y., Greiner, R.: Discriminative Model Selection for Belief Net Structures. In: Proceedings of the Twentieth National Conference on Artificial Intelligence, pp. 770–776. AAAI Press (2005)
Provost, F.J., Domingos, P.: Tree Induction for Probability-Based Ranking. Machine Learning 52(3), 199–215 (2003)
Frank, E., Hall, M., Pfahringer, B.: Locally Weighted Naive Bayes. In: Proceedings of the Conference on Uncertainty in Artificial Intelligence, pp. 249–256. Morgan Kaufmann, San Francisco (2003)
Atkeson, C.G., Moore, A.W., Schaal, S.: Locally Weighted Learning. Artificial Intelligence Review 11(1-5), 11–73 (1997)
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo, CA (1993)
Provost, F., Fawcett, T., Kohavi, R.: The case against accuracy estimation for comparing induction algorithms. In: Proceedings of the Fifteenth International Conference on Machine Learning, pp. 445–453. Morgan Kaufmann, San Francisco (1998)
Friedman, Geiger, Goldszmidt: Bayesian Network Classifiers. Machine Learning 29, 131–163 (1997)
Pearl, J.: Probabilistic Reasoning in Intelligent Systems. Morgan Kaufmann, San Francisco, CA (1988)
Chickering, D.M.: Learning Bayesian networks is NP-Complete. In: Fisher, D., Lenz, H. (eds.) Learning from Data: Artificial Intelligence and Statistics V, pp. 121–130. Springer, Heidelberg (1996)
Kohavi, R.: Scaling Up the Accuracy of Naive-Bayes Classifiers: A Decision-Tree Hybrid. In: KDD 1996. Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, pp. 202–207. AAAI Press (1996)
Li, C., Jiang, L.: Using Locally Weighted Learning to Improve SMOreg for Regression. In: Yang, Q., Webb, G. (eds.) PRICAI 2006. LNCS (LNAI), vol. 4099, pp. 375–384. Springer, Heidelberg (2006)
Merz, C., Murphy, P., Aha, D.: UCI repository of machine learning databases. In Dept of ICS, University of California, Irvine (1997), http://www.ics.uci.edu/mlearn/MLRepository.html
Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005), http://prdownloads.sourceforge.net/weka/datasets-UCI.jar
Bradley, A.P.: The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition 30, 1145–1159 (1997)
Hand, D.J., Till, R.J.: A simple generalisation of the area under the ROC curve for multiple class classification problems. Machine Learning 45, 171–186 (2001)
Ling, C.X., Huang, J., Zhang, H.: AUC: a statistically consistent and more discriminating measure than accuracy. In: IJCAI 2003. Proceedings of the International Joint Conference on Artificial Intelligence, Morgan Kaufmann, San Francisco (2003)
Friedman, J., Kohavi, R., Yun, Y.: Lazy decision trees. In: Proceedings of the Thirteenth National Conference on Artificial Intelligence, pp. 717–724. The AAAI Press, Menlo Park, CA (1996)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Jiang, L., Zhang, H., Wang, D., Cai, Z. (2007). Learning Locally Weighted C4.4 for Class Probability Estimation. In: Corruble, V., Takeda, M., Suzuki, E. (eds) Discovery Science. DS 2007. Lecture Notes in Computer Science(), vol 4755. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-75488-6_11
Download citation
DOI: https://doi.org/10.1007/978-3-540-75488-6_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-75487-9
Online ISBN: 978-3-540-75488-6
eBook Packages: Computer ScienceComputer Science (R0)