Abstract
Feature subset selection is an important subject when training classifiers in Machine Learning (ML) problems. Too many input features in a ML problem may lead to the so-called “curse of dimensionality”, which describes the fact that the complexity of the classifier parameters adjustment during training increases exponentially with the number of features. Thus, ML algorithms are known to suffer from important decrease of the prediction accuracy when faced with many features that are not necessary. In this paper, we introduce a novel embedded feature selection method, called ESFS, which is inspired from the wrapper method SFS since it relies on the simple principle to add incrementally most relevant features. Its originality concerns the use of mass functions from the evidence theory that allows to merge elegantly the information carried by features, in an embedded way, and so leading to a lower computational cost than original SFS. This approach has successfully been applied to the domain of image categorization and has shown its effectiveness through the comparison with other feature selection methods.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Hall, M.A., Smith, L.A.: Feature Subset Selection: A Correlation Based Filter Approach. In: International Conference on Neural Information Processing and Intelligent Information Systems, pp. 855–858. Springer, Heidelberg (1997)
Kohavi, R., John, G.H.: Wrappers for Feature Subset Selection. Artificial Intelligence 97(1-2), 273–324 (1997); Special issue on relevance
Guyon, I., Elisseff, A.: An Introduction to Variable and Feature Selection. Journal of Machine Learning Research 3, 1157–1182 (2003)
Blum, A., Langley, P.: Selection of Relevant Features and Examples in Machine Learning. Artificial Intelligence 97, 245–271 (1997)
Kojadinovic, I., Wottka, T.: Comparison Between a Filter and a Wrapper Approach to Variable Subset Selection in Regression Problems. In: European Symposium on Intelligent Techniques, Aachen, Germany, September 14-15 (2000)
Arauzo-Azofra, A., Benitez, J.M., Castro, J.L.: A Feature Set Measure Based on Relief. In: Proceedings of the 5th International Conference on Recent Advances in Soft Computing, pp. 104–109. Nottingham (2004)
Almuallim, H., Dietterich, T.G.: Learning with Many Irrelevant Features. In: Proceedings of the 9th National Conference on Artificial Intelligence, pp. 547–552. AAAI Press, San Jose (1991)
Mao, K.Z.: Orthogonal Forward Selection and Backward Elimination Algorithms for Feature Subset Selection. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics 34(1), 629–634 (2004)
Yang, J.H., Honavar, V.: Feature Subset Selection Using a Genetic Algorithm. IEEE Intelligent Systems 13(2), 44–49 (1998)
Huang, J.J., Cai, Y.Z., Xu, X.M.: A Hybrid Genetic Algorithm for Feature Selection Wrapper Based on Mutual Information. Pattern Recognition Letters 28(13), 1825–1844 (2007)
Whitney, A.W.: A Direct Method of Nonparametric Measurement Selection. IEEE Transactions on Computers 20(9), 1100–1103 (1971)
Pudil, P., Novovičová, J., Kittler, J.: Floating Search Methods in Feature Selection. Pattern Recognition Letters 15(11), 1119–1125 (1994)
Somol, P., Pudil, P.: Oscillating Search Algorithms for Feature Selection. In: Proceedings of the 15th International Conference on Pattern Recognition, pp. 406–409 (2000)
Spence, C., Sajda, P.: The Role of Feature Selection in Building Pattern Recognizers for Computer-aided Diagnosis. In: Proceedings of SPIE. Medical Imaging 1998: Image Processing, vol. 3338, pp. 1434–1441 (1998)
Quinlan, J.R.: Induction of Decision Trees. Machine Learning 1(1), 81–106 (1986)
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Series in Machine Learning (1993)
Quinlan, J.R.: Improved Use of Continuous Attributes in C4.5. Journal of Artificial Intelligence Research 4, 77–90 (1996)
Breiman, L., Friedman, J.H., Olshen, R., Stone, C.J.: Classification and Regression Trees. Chapman & Hall/CRC, Boca Raton
Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene Selection for Cancer Classification using Support Vector Machines. Machine Learning 46(1-3), 389–422 (2002)
Rakotomamonjy, A.: Variable Selection Using SVM-based Criteria. Journal of Machine Learning Research 3, 1357–1370 (2003)
Dempster, A.P.: A Generalization of Bayesian Inference. J. Royal Statistical Soc. Series B 30 (1968)
Shafer, G.: A Mathematical Theory of Evidence. Princeton University Press, Princeton (1976)
Narendra, P.M., Fukunaga, K.: A Branch and Bound Algorithm for Feature Selection. IEEE Transactions on Computers 26(9), 917–922 (1977)
Rakotomalala, R.: TANAGRA: A free software for the education and the research. In: Actes de EGC 2005, RNTI-E-3, vol. 2, pp. 697–702 (2005)
Wang, J.Z., Li, J., Wiederhold, G.: SIMPLIcity: Semantics-Sensitive Integrated Matching for Picture Libraries. IEEE Transactions on Pattern Analysis and Machine Intelligence 23(9), 947–963 (2001)
Jolliffe, I.T.: Principal Component Analysis. Springer series in statistics (2002)
Schweizer, B., Sklar, A.: Probabilistic Metric Spaces. North Holland, New York (1983)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Fu, H., Xiao, Z., Dellandréa, E., Dou, W., Chen, L. (2009). Image Categorization Using ESFS: A New Embedded Feature Selection Method Based on SFS. In: Blanc-Talon, J., Philips, W., Popescu, D., Scheunders, P. (eds) Advanced Concepts for Intelligent Vision Systems. ACIVS 2009. Lecture Notes in Computer Science, vol 5807. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04697-1_27
Download citation
DOI: https://doi.org/10.1007/978-3-642-04697-1_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04696-4
Online ISBN: 978-3-642-04697-1
eBook Packages: Computer ScienceComputer Science (R0)