Abstract
In a world where data is captured on a large scale the major challenge for data mining algorithms is to be able to scale up to large datasets. There are two main approaches to inducing classification rules, one is the divide and conquer approach, also known as the top down induction of decision trees; the other approach is called the separate and conquer approach. A considerable amount of work has been done on scaling up the divide and conquer approach. However, very little work has been conducted on scaling up the separate and conquer approach.In this work we describe a parallel framework that allows the parallelisation of a certain family of separate and conquer algorithms, the Prism family. Parallelisation helps the Prism family of algorithms to harvest additional computer resources in a network of computers in order to make the induction of classification rules scale better on large datasets. Our framework also incorporates a pre-pruning facility for parallel Prism algorithms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Hunt E. B., Marin J., and Stone P. J., Experiments in Induction. 1966: Academic Press.
Quinlan J. R., Induction of decision trees. Machine Learning.Vol. 1. 1986. 81-106.
Michalski R.S., Onthe quasi-minimal solution of the general covering problem, in Proceedings of the Fifth International Symposium on Information Processing.1969: Bled, Yugoslavia. p. 125-128.
Cendrowska J., PRISM: an Algorithm for Inducing Modular Rules.International Journal of Man-Machine Studies, 1987. 27: p. 349-370.
Catlett J., Megainduction: Machine learning on very large databases.1991, University of Technology, Sydney.
Metha M., Agrawal R., and Rissanen J., SLIQ: A Fast Scalable Classifier for Data Mining. International Conference on Extending Database Technology EDBT'96), 1996.
Shafer J. C., Agrawal R., and Mehta M., SPRINT: A Scalable Parallel Classifier for Data Mining. Twenty-second International Conference on Very Large Data Bases, 1996.
Srivastava, A., et al., Parallel Formulations of Decision-Tree Classification Algorithms. Data Mining and Knowledge Discovery, 1999. 3(3): p. 237-263.
Stahl F., Bramer M., and A. M., PMCRI: A Parallel Modular Classification Rule Induction Framework., in Sixth International Conference on Machine Learning and Data Mining.In Press, Springer: Leipzig.
Bramer M., An Information-Theoretic Approach to the Pre-pruning of Classification Rules. Proceedings of the IFIP Seventeenth World Computer Congress - TC12 Stream on Intelligent Information Processing. 2002: Kluwer, B.V. 201-212.
Bramer M., Inducer: a public domain workbench for data mining. International Journal of Systems Science, 2005. 36(14): p. 909-919.
Smyth, P. and R.M. Goodman, An Information Theoretic Approach to Rule Induction from Databases. IEEE Trans. on Knowledge and Data Eng, 1991. 4(4): p. 301-316.
Blake C. L. and Merz C. J, UCI repository of machine learning databases. 1998, University of California, Irvine, Department of Information and Computer Sciences.
Stout M., et al., Prediction of recursive convex hull class assignments for protein residues. Bioinformatics, 2008. 24(7): p. 916-923.
Provost F., Distributed Data Mining: Scaling up and Beyond, in Advances in Distributed and Parallel Knowledge Discovery, P.C. H. Kargupta, Editor. 2000, AAAI Press / The MIT Press.
Nolle L., Wong K. C. P., and Hopgood A., DARBS: A Distributed Blackboard System. Twenty-first SGES International Conference on Knowledge Based Systems, 2001.
Stahl F. and Bramer M., P-Prism: A Computationally Efficient Approach to Scaling up Classification Rule Induction, in IFIP International Conference on Artificial Intelligence. 2008, Springer: Milan.
Stahl F. and Bramer M., Parallel Induction of Modular Classification Rules, in Twentyeighth SGAI International Conference on Innovative Techniques and Applications of Artificial Intelligence. 2008, Springer: Cambridge.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag London
About this paper
Cite this paper
Stahl, F., Bramer, M., Adda, M. (2010). Parallel Rule Induction with Information Theoretic Pre-Pruning. In: Bramer, M., Ellis, R., Petridis, M. (eds) Research and Development in Intelligent Systems XXVI. Springer, London. https://doi.org/10.1007/978-1-84882-983-1_11
Download citation
DOI: https://doi.org/10.1007/978-1-84882-983-1_11
Published:
Publisher Name: Springer, London
Print ISBN: 978-1-84882-982-4
Online ISBN: 978-1-84882-983-1
eBook Packages: Computer ScienceComputer Science (R0)