Abstract
Discretization leads the improvement in classification accuracy and generalizes the problem well for further knowledge extraction. As a result researchers have developed various discretization methods for preprocessing the data. This paper provides a survey of existing discretization methods that preprocess the data for datamining. Also the paper evaluates the effectiveness of various discretization methods in terms of better discretization scheme and better accuracy of classification by comparing the performance of some traditional and recent discetization algorithms on six different real datasets namely Iris, Ionosphere, Waveform-5000, Wisconsin breast cancer, Hepatitis Domain and Pima Indian Diabetes. The feedforward neural network with conjugate gradient training algorithm is used to compute the accuracy of classification from the data discretized by those algorithms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann (2001)
Cios, K.J., Kurgan, L.A.: CLIP 4: Hybrid inductive machine learning algorithm that generates inequality rules. Information Science 163, 37–83 (2004)
Clark, P., Niblett, T.: The CN2 algorithm. Machine Learning 3, 261–283 (1989)
Kurgan, L.A., Cios, K.J.: CAIM Discretization Algorithm. IEEE Trans. on Knowledge and Data Engineering 16, 145–152 (2004)
Tsai, C.J., Lee, C.I., Yang, W.P.: A Discretization algorithm based on Class-Attribute Contingency Coefficient. Information Sciences 178, 714–731 (2008)
Butterworth, R., Simovici, D.A., Santos, G.S., Machado, L.O.: A Greedy Algorithm for supervised discretization. Biomedical Informatics 37, 285–292 (2004)
Fayyad, U.M., Irani, K.B.: Multi-interval discretization of continuous-valued attributes for classification learning. In: Proc. of Thirteenth Int. Conf. on Artificial Intelligence, pp. 1022–1027 (1993)
Liu, H., Hussain, F., Tan, C.L., Dash, M.: Discretization: An Enabling Technique. Data Mining and Knowledge Discovery 6, 393–423 (2002)
Cohen, S., Rokach, L., Maimon, O.: Decision-tree instance-space decomposition with grouped gain-ratio. Information Sciences 177, 3592–3612 (2007)
Yager, R.R.: An extension of the naive Bayesian Classifier. Information Sciences 176, 577–588 (2006)
Kaikhah, K., Doddmeti, S.: Discovering trends in large datasets using neural network. Applied Intelligence 29, 51–60 (2006)
Ozekes, S., Osman, O.: Classification and prediction in data mining with neural networks. Electrical and Electronics Engineering 3, 707–712 (2003)
Dam, H., Abbass, H.A., Lokan, C., Yao, X.: Neural based learning classifier systems. IEEE Trans. on Knowledge and Data Engineering 20, 26–39 (2008)
Moller, F.: A scaled conjugate gradient algorithm for fast supervised learning. Neural Networks 6, 525–533 (1990)
Kurgan, L.A., Cios, K.J.: Fast Class-Attribute Interdependence Maximization (FCAIM) Discretization Algorithm. In: Proc. of Int. Conf. on Machine Learning and Applications, pp. 30–36 (2003)
Kerber, R.: ChiMerge: Discretization of numeric attributes. In: Proc. of Ninth Int. Conf. on Artificial Intelligence, pp. 123–128 (1992)
Wu, Q., Bell, D.A., McGinnity, M., Prasad, G., Qi, G., Huang, X.: Improvement of Decision Accuracy Using Discretization of Continuous Attributes. In: Wang, L., Jiao, L., Shi, G., Li, X., Liu, J. (eds.) FSKD 2006. LNCS (LNAI), vol. 4223, pp. 674–683. Springer, Heidelberg (2006)
Quinlan, J.R.: Induction of decision trees. Machine Learning 1, 81–106 (1986)
Ferrandiz, S., Boullé, M.: Multivariate discretization by recursive supervised bipartition of graph. In: Perner, P., Imiya, A. (eds.) MLDM 2005. LNCS (LNAI), vol. 3587, pp. 253–264. Springer, Heidelberg (2005)
Liu, H., Setiono, R.: Feature selection via discretization. IEEE Trans. on Knowledge and Data Engineering 9, 642–645 (1997)
Tay, F., Shen, L.: A modified chi2 algorithm for discretization. IEEE Trans. on Knowledge and Data Engineering 14, 666–670 (2002)
Su, C.T., Hsu, J.H.: An extended chi2 algorithm for discretization of real value attributes. IEEE Trans. on Knowledge and Data Engineering 17, 437–441 (2005)
Catlett, J.: On changing continuous attributes into ordered discrete attributes. In: Kodratoff, Y. (ed.) EWSL 1991. LNCS, vol. 482, pp. 164–177. Springer, Heidelberg (1991)
Grzymała-Busse, J.W.: Three strategies to rule induction from data with numerical attributes. In: Peters, J.F., Skowron, A., Dubois, D., Grzymała-Busse, J.W., Inuiguchi, M., Polkowski, L. (eds.) Transactions on Rough Sets II. LNCS, vol. 3135, pp. 54–62. Springer, Heidelberg (2004)
Kass, G.: An exploratory technique for investigating large quantities of categorical data. Applied Statistics 29, 119–127 (1980)
MacQueen, J.B.: Some Methods for classification and Analysis of Multivariate Observations. In: Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297. University of California Press, Berkeley (1967)
Bradley, Fayyad, Reina: Scaling EM (Expectation-Maximization) Clustering to Large Databases. Technical Report MSR-TR-98-35, Microsoft Research (1998)
Yang, Y., Webb, G.I.: Proportional k-Interval Discretization for Naive-Bayes Classifiers. In: Flach, P.A., De Raedt, L. (eds.) ECML 2001. LNCS (LNAI), vol. 2167, pp. 564–575. Springer, Heidelberg (2001)
Ching, J.Y., Wong, A.K.C., Chan, K.C.C.: Class-dependent discretization for inductive learning from continuous and mixed mode data. IEEE Transactions on Pattern Analysis and Machine Intelligence 17, 641–651 (1995)
Cios, K.J., Pedrycz, W., Swiniarski, R.: Data Mining Methods for Knowledge Discovery. Kluwer (1998), http://www.wkap.nl/~book.htm/0-7923-8252-8
Huang, W.: Discretization of Continuous Attributes for Inductive Machine Learn- ing, masters thesis, Dept. Computer Science, Univ. of Toledo, Ohio (1996)
Kurgan, L.A., Cios, K.J.: Fast Class-Attribute Interdependence Maximization (FCAIM) Discretization Algorithm. In: Proc. of Int. Conf. on Machine Learning and Applications, pp. 30–36 (2003)
Cerquides, J., Lopez de Mantaras, R.: Maximum a posteriori tree augmented naive bayes classifiers. In: Suzuki, E., Arikawa, S. (eds.) DS 2004. LNCS (LNAI), vol. 3245, pp. 73–88. Springer, Heidelberg (2004)
Bacardit, J., Garrell, J.M.: Evolving Multiple Discretizations with Adaptive Intervals for a Pittsburgh Rule-Based Learning Classifier System. In: Cantú-Paz, E., Foster, J.A., Deb, K., Davis, L., Roy, R., O’Reilly, U.-M., Beyer, H.-G., Kendall, G., Wilson, S.W., Harman, M., Wegener, J., Dasgupta, D., Potter, M.A., Schultz, A., Dowsland, K.A., Jonoska, N., Miller, J., Standish, R.K. (eds.) GECCO 2003. LNCS, vol. 2724, pp. 1818–1831. Springer, Heidelberg (2003)
Divina, F., Marchiori, E.: Handling continuous attributes in an evolutionary induc- tive learner. IEEE Trans. on Evolutionary Computation 9, 31–43 (2005)
Gethsiyal Augasta, M., Kathirvalavakumar, T.: A new Discretization algorithm based on Range coefficient of Dispersion and Skewness for neural networks classifier. Applied Soft Computing 12, 619–625 (2012)
Alcalá-Fdez, J., Sánchez, L., García, S., del Jesus, M.J., Ventura, S., Garrell, J.M., Otero, J., Romero, C., Bacardit, J., Rivas, V.M., Fernández, J.C., Herrera, F.: KEEL: A Software Tool to Assess Evolutionary Algorithms to Data Mining Problems. Soft Computing 13(3), 307–318 (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer International Publishing Switzerland
About this paper
Cite this paper
Augasta, M.G., Kathirvalavakumar, T. (2013). An Empirical Comparison of Discretization Methods for Neural Classifier. In: Prasath, R., Kathirvalavakumar, T. (eds) Mining Intelligence and Knowledge Exploration. Lecture Notes in Computer Science(), vol 8284. Springer, Cham. https://doi.org/10.1007/978-3-319-03844-5_5
Download citation
DOI: https://doi.org/10.1007/978-3-319-03844-5_5
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-03843-8
Online ISBN: 978-3-319-03844-5
eBook Packages: Computer ScienceComputer Science (R0)