Skip to main content
Log in

Approximation-based feature selection and application for algae population estimation

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

This paper presents a data-driven approach for feature selection to address the common problem of dealing with high-dimensional data. This approach is able to handle the real-valued nature of the domain features, unlike many existing approaches. This is accomplished through the use of fuzzy-rough approximations. The paper demonstrates the effectiveness of this research by proposing an estimator of algae populations, a system that approximates, given certain water characteristics, the size of algae populations. This estimator significantly reduces computer time and space requirements, decreases the cost of obtaining measurements and increases runtime efficiency, making itself more viable economically. By retaining only information required for the estimation task, the system offers higher accuracy than conventional estimators. Finally, the system does not alter the domain semantics, making any distilled knowledge human-readable. The paper describes the problem domain, architecture and operation of the system, and provides and discusses detailed experimentation. The results show that algae estimators using a fuzzy-rough feature selection step produce more accurate predictions of algae populations in general.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  1. Au WH, Chan KCC (1998) An effective algorithm for discovering fuzzy rules in relational databases. In: Proceedings of the 7th IEEE international conference on fuzzy systems, pp 1314–1319

  2. Bishop C (1995) Neural networks for pattern recognition. Oxford University Press, Oxford

    Google Scholar 

  3. Chan K, Wong A (1990) APACS: a system for automatic analysis and classification of conceptual patterns. Comput Intell 6:119–131

    Article  Google Scholar 

  4. Chan R (1999) Protecting rivers & streams by monitoring chemical concentrations and algae communities. In: ERUDIT: 3rd international competition of data analysis by intelligent techniques (runner up)

  5. Chouchoulas A, Shen Q (2001) Rough set-aided keyword reduction for text categorisation. Appl Artif Intell 15(9):843–873

    Article  Google Scholar 

  6. Dash M, Liu H (1997) Feature selection for classification. Intell Data Anal 1(3):131–156

    Article  Google Scholar 

  7. Dash M, Liu H (2003) Consistency-based search in feature selection. Artif Intell 151(1–2):155–176

    Article  MATH  MathSciNet  Google Scholar 

  8. Devijver P, Kittler J (1982) Pattern recognition: a statistical approach. Prentice Hall, New York

    MATH  Google Scholar 

  9. Dubois D, Prade H (1992) Putting rough sets and fuzzy sets together. In: Slowinski R (ed) Intelligent decision support. Kluwer Academic, Dordrecht, pp 203–232

    Google Scholar 

  10. Edwards AL (1976) An introduction to linear regression and correlation. Freeman, San Francisco

    Google Scholar 

  11. ERUDIT, European network for fuzzy logic and uncertainty modeling in information technology (1999) Protecting rivers and streams by monitoring chemical concentrations and algae communities, 3rd international competition

  12. Flury B, Riedwyl H (1988) Multivariate statistics: a practical approach. Prentice Hall, New York

    Google Scholar 

  13. Hayashi I, Maeda T, Bastian A, Jain LC (1998) Generation of fuzzy decision trees by fuzzy ID3 with adjusting mechanism of AND/OR operators. In: Proceedings of the 7th IEEE international conference on fuzzy systems, pp 681–685

  14. Höhle U (1988) Quotients with respect to similarity relations. Fuzzy Sets Syst 27:31–44

    Article  MATH  Google Scholar 

  15. Janikow CZ (1998) Fuzzy decision trees: issues and methods. IEEE Trans Syst Man Cybern Part B: Cybern 28:1–14

    Article  Google Scholar 

  16. Jensen R, Shen Q (2004) Semantics-preserving dimensionality reduction: rough and fuzzy-rough based approaches. IEEE Trans Knowl Data Eng 16(12):1457–1471

    Article  Google Scholar 

  17. Kira K, Rendell LA (1992) The feature selection problem: traditional methods and a new algorithm. In: Proceedings of ninth national conference on artificial intelligence, pp 129–134

  18. Kononenko I (1994) Estimating attributes: analysis and extensions of Relief. In: Proceedings of the European conference on machine learning, pp 171–182

  19. Liu H, Motoda H (1998) Feature selection for knowledge discovery and data mining. Kluwer Academic, Dordrecht

    MATH  Google Scholar 

  20. Liu H, Yu L (2005) Toward integrating feature selection algorithms for classification and clustering. IEEE Trans Knowl Data Eng 17(3):1–12

    MATH  Google Scholar 

  21. Marin-Blázquez JG, Shen Q (2002) From approximative to descriptive fuzzy classifiers. IEEE Trans Fuzzy Syst 10(4):484–497

    Article  Google Scholar 

  22. Pal SK, Skowron A (eds) (1999) Rough-fuzzy hybridization: a new trend in decision making. Springer, Singapore

    MATH  Google Scholar 

  23. Pawlak Z (1991) Rough sets: theoretical aspects of reasoning about data. Kluwer Academic, Dordrecht

    MATH  Google Scholar 

  24. Platt J (1998) Fast training of support vector machines using sequential minimal optimization. In: Schölkopf B, Burges C, Smola A (eds) Advances in kernel methods—support vector learning. MIT, Cambridge

    Google Scholar 

  25. Quinlan JR (1993) C4.5: programs for machine learning. The Morgan Kaufmann series in machine learning. Kaufmann, San Mateo

    Google Scholar 

  26. Shen Q, Chouchoulas A (2001) FuREAP: a fuzzy-rough estimator of algae population. Artif Intell Eng 15(1):13–24

    Article  Google Scholar 

  27. Shen Q, Jensen R (2004) Selecting informative features with fuzzy-rough sets and its application for complex systems monitoring. Pattern Recognit 37(7):1351–1363

    Article  MATH  Google Scholar 

  28. Smola AJ, Schölkopf B (1998) A tutorial on support vector regression. NeuroCOLT2 Technical Report Series NC2-TR-1998-030

  29. Stone M (1974) Cross-validatory choice and assessment of statistical predictions. J Roy Stat Soc B 36:111–147

    MATH  Google Scholar 

  30. Wang Y (2000) A new approach to fitting linear models in high dimensional spaces. PhD thesis, Department of Computer Science, University of Waikato

  31. Wang Y, Witten IH (1997) Inducing model trees for continuous classes. In: van Someren M, Widmer G (eds) Proceeding Poster papers: ninth European conference on machine learning, Prague, Czech Republic, pp 128–137

  32. Witten IH, Frank E (2000) Data mining: practical machine learning tools with Java implementations. Kaufmann, San Francisco

    Google Scholar 

  33. Yao Y, Chen Y (2006) Rough set approximations in formal concept analysis. LNCS Trans Rough Sets 5:285–305

    Article  MathSciNet  Google Scholar 

  34. Zadeh LA (1975) The concept of a linguistic variable and its application to approximate reasoning. Inf Sci 8:199–249

    Article  MathSciNet  Google Scholar 

  35. Zadeh LA (1975) Inf Sci 301–357

  36. Zadeh LA (1975) Inf Sci 9:43–80

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qiang Shen.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shen, Q., Jensen, R. Approximation-based feature selection and application for algae population estimation. Appl Intell 28, 167–181 (2008). https://doi.org/10.1007/s10489-007-0058-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-007-0058-y

Keywords