Abstract
Imbalance distribution of samples between churners and non-churners can hugely affect churn prediction results in telecommunication services field. One method to solve this is over-sampling approach by PCA regression. However, PCA regression may not generate good churn samples if a dataset is nonlinear discriminant. We employed Genetic K-means Algorithm to cluster a dataset to find locally optimum small dataset to overcome the problem. The experiments were carried out on a real-world telecommunication dataset and assessed on a churn prediction task. The experiments showed that Genetic K-means Algorithm can improve prediction results for PCA regression and performed as good as SMOTE.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Burges, C.J.C.: A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery 2(2), 121–167 (1998)
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kergelmeyer, W.P.: SMOTE: Synthetic Minority Over-sampling Technique. JAIR 16, 321–357 (2002)
Hosmer, D.W., Lemeshow, S.: Applied Logistic Regression. Wiley, New York (1989)
Kohonen, T.: Self-Organizing Maps. Series in Information Sciences, vol. 30. Springer, Heidelberg (2001)
Quinlan, J.R.: Improved use of continuous attributes in c4. 5. Journal of Artificial Intelligence Research 4, 77–90 (1996)
Zhang, J., Yang, Y., Lades, M.: Face Recognition: Eigenface, Elastic Matching, and Neural Nets. In: The IEEE, pp. 1423–1435 (1997)
Luo, B., Peiji, S., Juan, L.: Customer Churn Prediction Based on the Decision Tree in Personal Handyphone System Service. In: International Conference on Service Systems and Service Management, pp. 1–5 (2007)
Au, W., Chan, C.C., Yao, X.: A novel evolutionary data mining algorithm with applications to churn prediction. IEEE Transactions on Evolutionary Computation 7, 532–545 (2003)
Bradley, A.P.: The Use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition 30, 1145–1159 (1997)
Bäck, T.: Evolutionary Algorithms in Theory and Practice. ch. 2. Oxford Univeristy Press, Oxford (1996)
Chawla, N.V., Japkowicz, N., Kotcz, A.: Editorial: special issue on learning from imbalanced data sets. SIGKDD Explor. Newsl. 6, 1–6 (2004)
Domingos, P.: MetaCost: A general method for making classifiers cost sensitive. In: The 5th International Conference on Knowledge Discovery and Data Mining, pp. 155–164 (1999)
Goldberg, D.E.: Genetic Algorithms in Search, Optimization and Machine Learning. Kluwer Academic Publishers, Dordrecht (1989)
Jolliffe, I.T.: Principal Components Analysis. Springer, New York (1986)
Huang, B.Q., Kechadi, M.T., Buckley, B.: Customer Churn Prediction for Broadband Internet Services. In: Pedersen, T.B., Mohania, M.K., Tjoa, A.M. (eds.) Data Warehousing and Knowledge Discovery. LNCS, vol. 5691, pp. 229–243. Springer, Heidelberg (2009)
Wei, C., Chiu, I.: Turning telecommunications call details to churn prediction: a data mining approach. Expert Systems with Applications 23, 103–112 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Huang, B., Satoh, T., Huang, Y., Kechadi, M.T., Buckley, B. (2010). Using Genetic K-Means Algorithm for PCA Regression Data in Customer Churn Prediction. In: Cao, L., Zhong, J., Feng, Y. (eds) Advanced Data Mining and Applications. ADMA 2010. Lecture Notes in Computer Science(), vol 6441. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-17313-4_22
Download citation
DOI: https://doi.org/10.1007/978-3-642-17313-4_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-17312-7
Online ISBN: 978-3-642-17313-4
eBook Packages: Computer ScienceComputer Science (R0)