Skip to main content

Using Genetic K-Means Algorithm for PCA Regression Data in Customer Churn Prediction

  • Conference paper
Advanced Data Mining and Applications (ADMA 2010)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6441))

Included in the following conference series:

  • 3227 Accesses

Abstract

Imbalance distribution of samples between churners and non-churners can hugely affect churn prediction results in telecommunication services field. One method to solve this is over-sampling approach by PCA regression. However, PCA regression may not generate good churn samples if a dataset is nonlinear discriminant. We employed Genetic K-means Algorithm to cluster a dataset to find locally optimum small dataset to overcome the problem. The experiments were carried out on a real-world telecommunication dataset and assessed on a churn prediction task. The experiments showed that Genetic K-means Algorithm can improve prediction results for PCA regression and performed as good as SMOTE.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Burges, C.J.C.: A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery 2(2), 121–167 (1998)

    Article  Google Scholar 

  2. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kergelmeyer, W.P.: SMOTE: Synthetic Minority Over-sampling Technique. JAIR 16, 321–357 (2002)

    MATH  Google Scholar 

  3. Hosmer, D.W., Lemeshow, S.: Applied Logistic Regression. Wiley, New York (1989)

    MATH  Google Scholar 

  4. Kohonen, T.: Self-Organizing Maps. Series in Information Sciences, vol. 30. Springer, Heidelberg (2001)

    MATH  Google Scholar 

  5. Quinlan, J.R.: Improved use of continuous attributes in c4. 5. Journal of Artificial Intelligence Research 4, 77–90 (1996)

    MATH  Google Scholar 

  6. Zhang, J., Yang, Y., Lades, M.: Face Recognition: Eigenface, Elastic Matching, and Neural Nets. In: The IEEE, pp. 1423–1435 (1997)

    Google Scholar 

  7. Luo, B., Peiji, S., Juan, L.: Customer Churn Prediction Based on the Decision Tree in Personal Handyphone System Service. In: International Conference on Service Systems and Service Management, pp. 1–5 (2007)

    Google Scholar 

  8. Au, W., Chan, C.C., Yao, X.: A novel evolutionary data mining algorithm with applications to churn prediction. IEEE Transactions on Evolutionary Computation 7, 532–545 (2003)

    Article  Google Scholar 

  9. Bradley, A.P.: The Use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition 30, 1145–1159 (1997)

    Article  Google Scholar 

  10. Bäck, T.: Evolutionary Algorithms in Theory and Practice. ch. 2. Oxford Univeristy Press, Oxford (1996)

    MATH  Google Scholar 

  11. Chawla, N.V., Japkowicz, N., Kotcz, A.: Editorial: special issue on learning from imbalanced data sets. SIGKDD Explor. Newsl. 6, 1–6 (2004)

    Article  Google Scholar 

  12. Domingos, P.: MetaCost: A general method for making classifiers cost sensitive. In: The 5th International Conference on Knowledge Discovery and Data Mining, pp. 155–164 (1999)

    Google Scholar 

  13. Goldberg, D.E.: Genetic Algorithms in Search, Optimization and Machine Learning. Kluwer Academic Publishers, Dordrecht (1989)

    MATH  Google Scholar 

  14. Jolliffe, I.T.: Principal Components Analysis. Springer, New York (1986)

    Book  MATH  Google Scholar 

  15. Huang, B.Q., Kechadi, M.T., Buckley, B.: Customer Churn Prediction for Broadband Internet Services. In: Pedersen, T.B., Mohania, M.K., Tjoa, A.M. (eds.) Data Warehousing and Knowledge Discovery. LNCS, vol. 5691, pp. 229–243. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  16. Wei, C., Chiu, I.: Turning telecommunications call details to churn prediction: a data mining approach. Expert Systems with Applications 23, 103–112 (2002)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Huang, B., Satoh, T., Huang, Y., Kechadi, M.T., Buckley, B. (2010). Using Genetic K-Means Algorithm for PCA Regression Data in Customer Churn Prediction. In: Cao, L., Zhong, J., Feng, Y. (eds) Advanced Data Mining and Applications. ADMA 2010. Lecture Notes in Computer Science(), vol 6441. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-17313-4_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-17313-4_22

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-17312-7

  • Online ISBN: 978-3-642-17313-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics