Abstract
This paper presents a data-driven approach for feature selection to address the common problem of dealing with high-dimensional data. This approach is able to handle the real-valued nature of the domain features, unlike many existing approaches. This is accomplished through the use of fuzzy-rough approximations. The paper demonstrates the effectiveness of this research by proposing an estimator of algae populations, a system that approximates, given certain water characteristics, the size of algae populations. This estimator significantly reduces computer time and space requirements, decreases the cost of obtaining measurements and increases runtime efficiency, making itself more viable economically. By retaining only information required for the estimation task, the system offers higher accuracy than conventional estimators. Finally, the system does not alter the domain semantics, making any distilled knowledge human-readable. The paper describes the problem domain, architecture and operation of the system, and provides and discusses detailed experimentation. The results show that algae estimators using a fuzzy-rough feature selection step produce more accurate predictions of algae populations in general.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Au WH, Chan KCC (1998) An effective algorithm for discovering fuzzy rules in relational databases. In: Proceedings of the 7th IEEE international conference on fuzzy systems, pp 1314–1319
Bishop C (1995) Neural networks for pattern recognition. Oxford University Press, Oxford
Chan K, Wong A (1990) APACS: a system for automatic analysis and classification of conceptual patterns. Comput Intell 6:119–131
Chan R (1999) Protecting rivers & streams by monitoring chemical concentrations and algae communities. In: ERUDIT: 3rd international competition of data analysis by intelligent techniques (runner up)
Chouchoulas A, Shen Q (2001) Rough set-aided keyword reduction for text categorisation. Appl Artif Intell 15(9):843–873
Dash M, Liu H (1997) Feature selection for classification. Intell Data Anal 1(3):131–156
Dash M, Liu H (2003) Consistency-based search in feature selection. Artif Intell 151(1–2):155–176
Devijver P, Kittler J (1982) Pattern recognition: a statistical approach. Prentice Hall, New York
Dubois D, Prade H (1992) Putting rough sets and fuzzy sets together. In: Slowinski R (ed) Intelligent decision support. Kluwer Academic, Dordrecht, pp 203–232
Edwards AL (1976) An introduction to linear regression and correlation. Freeman, San Francisco
ERUDIT, European network for fuzzy logic and uncertainty modeling in information technology (1999) Protecting rivers and streams by monitoring chemical concentrations and algae communities, 3rd international competition
Flury B, Riedwyl H (1988) Multivariate statistics: a practical approach. Prentice Hall, New York
Hayashi I, Maeda T, Bastian A, Jain LC (1998) Generation of fuzzy decision trees by fuzzy ID3 with adjusting mechanism of AND/OR operators. In: Proceedings of the 7th IEEE international conference on fuzzy systems, pp 681–685
Höhle U (1988) Quotients with respect to similarity relations. Fuzzy Sets Syst 27:31–44
Janikow CZ (1998) Fuzzy decision trees: issues and methods. IEEE Trans Syst Man Cybern Part B: Cybern 28:1–14
Jensen R, Shen Q (2004) Semantics-preserving dimensionality reduction: rough and fuzzy-rough based approaches. IEEE Trans Knowl Data Eng 16(12):1457–1471
Kira K, Rendell LA (1992) The feature selection problem: traditional methods and a new algorithm. In: Proceedings of ninth national conference on artificial intelligence, pp 129–134
Kononenko I (1994) Estimating attributes: analysis and extensions of Relief. In: Proceedings of the European conference on machine learning, pp 171–182
Liu H, Motoda H (1998) Feature selection for knowledge discovery and data mining. Kluwer Academic, Dordrecht
Liu H, Yu L (2005) Toward integrating feature selection algorithms for classification and clustering. IEEE Trans Knowl Data Eng 17(3):1–12
Marin-Blázquez JG, Shen Q (2002) From approximative to descriptive fuzzy classifiers. IEEE Trans Fuzzy Syst 10(4):484–497
Pal SK, Skowron A (eds) (1999) Rough-fuzzy hybridization: a new trend in decision making. Springer, Singapore
Pawlak Z (1991) Rough sets: theoretical aspects of reasoning about data. Kluwer Academic, Dordrecht
Platt J (1998) Fast training of support vector machines using sequential minimal optimization. In: Schölkopf B, Burges C, Smola A (eds) Advances in kernel methods—support vector learning. MIT, Cambridge
Quinlan JR (1993) C4.5: programs for machine learning. The Morgan Kaufmann series in machine learning. Kaufmann, San Mateo
Shen Q, Chouchoulas A (2001) FuREAP: a fuzzy-rough estimator of algae population. Artif Intell Eng 15(1):13–24
Shen Q, Jensen R (2004) Selecting informative features with fuzzy-rough sets and its application for complex systems monitoring. Pattern Recognit 37(7):1351–1363
Smola AJ, Schölkopf B (1998) A tutorial on support vector regression. NeuroCOLT2 Technical Report Series NC2-TR-1998-030
Stone M (1974) Cross-validatory choice and assessment of statistical predictions. J Roy Stat Soc B 36:111–147
Wang Y (2000) A new approach to fitting linear models in high dimensional spaces. PhD thesis, Department of Computer Science, University of Waikato
Wang Y, Witten IH (1997) Inducing model trees for continuous classes. In: van Someren M, Widmer G (eds) Proceeding Poster papers: ninth European conference on machine learning, Prague, Czech Republic, pp 128–137
Witten IH, Frank E (2000) Data mining: practical machine learning tools with Java implementations. Kaufmann, San Francisco
Yao Y, Chen Y (2006) Rough set approximations in formal concept analysis. LNCS Trans Rough Sets 5:285–305
Zadeh LA (1975) The concept of a linguistic variable and its application to approximate reasoning. Inf Sci 8:199–249
Zadeh LA (1975) Inf Sci 301–357
Zadeh LA (1975) Inf Sci 9:43–80
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Shen, Q., Jensen, R. Approximation-based feature selection and application for algae population estimation. Appl Intell 28, 167–181 (2008). https://doi.org/10.1007/s10489-007-0058-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-007-0058-y