Abstract
Feature selection is a widely recognized challenging task in dealing with application problems with a large number of features and a limited number of training samples. Filters and wrappers are the most popular feature selection strategies, but recent literature shows the emergence of hybrid approaches aiming at combining the strengths of filters and wrappers while avoiding their drawbacks. This paper proposes a new hybrid model for feature selection that takes advantage of a filter method to weight the relevance of each feature. Top-ranked features are selected, in an incremental way, resulting in a set of nested feature spaces of relatively small size. An evolutionary wrapper further refines each space by extracting small subsets of highly predictive features. Extensive experiments on a benchmark microarray dataset state the effectiveness of the proposed approach.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Saeys, Y., Inza, I., Larranaga, P.: A review of feature selection techniques in bioinformatics. Bioinformatics 23(19), 2507–2517 (2007)
Li, L., Weinberg, C.R., Darden, T.A., Pedersen, L.G.: Gene selection for sample classification based on gene expression data: study of sensitivity to choice of parameters of the GA/KNN method. Bioinformatics 17, 1131–1142 (2001)
Reddy, A.R., Deb, K.: Classification of two-class cancer data reliably using evolutionary algorithms. BioSystems 72(2003), 111–129 (2003)
Huerta, E.B., Duval, B., Hao, J.K.: A hybrid GA/SVM approach for gene selection and classification of microarray data. In: Rothlauf, F., Branke, J., Cagnoni, S., Costa, E., Cotta, C., Drechsler, R., Lutton, E., Machado, P., Moore, J.H., Romero, J., Smith, G.D., Squillero, G., Takagi, H. (eds.) EvoWorkshops 2006. LNCS, vol. 3907, pp. 34–44. Springer, Heidelberg (2006)
Dessì, N., Pes, B.: An Evolutionary Method for Combining Different Feature Selection Criteria in Microarray Data Classification. Journal of Artificial Evolution and Applications. Article ID 803973 (2009)
Kudo, M., Sklansky, J.: Comparison of algorithms that select features for pattern classifiers. Pattern Recognition 33, 25–41 (2000)
Leung, Y., Hung, Y.: A multiple-filter-multiple-wrapper approach to gene selection and microarray data classification. IEEE/ACM Transaction on Computational Biology and Bioinformatics 7(1), 108–117 (2010)
Alon, U., Barkai, N., Notterman, D., Gish, K., Ybarra, S., Mack, D., Levine, A.: Broad Patterns of Gene Expression Revealed by Clustering Analysis of Tumor and Normal Colon Tissues Probed by Oligonucleotide Arrays. PNAS 96, 6745–6750 (1999)
Vapnik, V.N.: Statistical Learning Theory. Wiley Interscience, Hoboken (1998)
Cannas, L.M., Dessì, N., Pes, B.: A filter-based evolutionary approach for selecting features in high-dimensional micro-array data. In: Shi, Z., Vadera, S., Aamodt, A., Leake, D. (eds.) IIP 2010. AICT, vol. 340, pp. 297–307. Springer, Heidelberg (2010)
Cannas, L.M., Dessì, N., Pes, B.: Tuning evolutionary algorithms in high dimensional classification problems. In: SEBD 2010, Rimini, Italy, pp. 142–149 (2010)
Hall, M., et al.: The WEKA data mining software: an update. SIGKDD Explorations 11(1) (2009)
Statnikov, A., Wang, L., Aliferis, C.F.: A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification. BMC Bioinformatics 9, 319 (2008)
Wang, Y., Makedon, F., Ford, J.C., Pearlman, J.D.: Hykgene: a hybrid approach for selecting marker genes for phenotype classification using microarray gene expression data. Bioinformatics 21(8), 1530–1537 (2005)
Peng, S., Xu, Q., Ling, X.B., Peng, X., Du, W., Chen, L.: Molecular classification of cancer types from microarray data using the combination of genetic algorithms and support vector machines. FEBS Letters 555(2), 358–362 (2003)
Yu, L., Liu, H.: Redundancy Based Feature Selection for Microarray Data. In: 10th ACM SIGKDD Int’l Conf. Knowledge Discovery and Data Mining (KDD 2004), pp. 737–742 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Cannas, L.M., Dessì, N., Pes, B. (2011). A Hybrid Model to Favor the Selection of High Quality Features in High Dimensional Domains. In: Yin, H., Wang, W., Rayward-Smith, V. (eds) Intelligent Data Engineering and Automated Learning - IDEAL 2011. IDEAL 2011. Lecture Notes in Computer Science, vol 6936. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23878-9_28
Download citation
DOI: https://doi.org/10.1007/978-3-642-23878-9_28
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23877-2
Online ISBN: 978-3-642-23878-9
eBook Packages: Computer ScienceComputer Science (R0)