Skip to main content
Log in

gMLC: a multi-label feature selection framework for graph classification

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Graph classification has been showing critical importance in a wide variety of applications, e.g. drug activity predictions and toxicology analysis. Current research on graph classification focuses on single-label settings. However, in many applications, each graph data can be assigned with a set of multiple labels simultaneously. Extracting good features using multiple labels of the graphs becomes an important step before graph classification. In this paper, we study the problem of multi-label feature selection for graph classification and propose a novel solution, called gMLC, to efficiently search for optimal subgraph features for graph objects with multiple labels. Different from existing feature selection methods in vector spaces that assume the feature set is given, we perform multi-label feature selection for graph data in a progressive way together with the subgraph feature mining process. We derive an evaluation criterion to estimate the dependence between subgraph features and multiple labels of graphs. Then, a branch-and-bound algorithm is proposed to efficiently search for optimal subgraph features by judiciously pruning the subgraph search space using multiple labels. Empirical studies demonstrate that our feature selection approach can effectively boost multi-label graph classification performances and is more efficient by pruning the subgraph search space using multiple labels.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Borgelt C, Berthold M (2002) Mining molecular fragments: Finding relevant substructures of molecules. In: Proceedings of the 2nd IEEE international conference on data mining. Maebashi City, Japan, pp 211–218

  2. Borgwardt KM (2007) Graph Kernels. PhD thesis, Ludwig-Maximilians-University Munich

  3. Boutell MR, Luo J, Shen X, Brown CM (2004) Learning multi-label scene classification. Pattern Recognit 37(9): 1757–1771

    Article  Google Scholar 

  4. Chen C, Yan X, Zhu F, Han J, Yu P (2009) Graph OLAP: a multi-dimensional framework for graph data analysis. Knowl Inf Syst 21(1): 41–63

    Article  Google Scholar 

  5. Comité FD, Gilleron R, Tommasi M (2003) Learning multi-label alternating decision tree from texts and data. In: Proceedings of the 3rd international conference on machine learning and data mining in pattern recognition. Leipzig, Germany, pp 35–49

  6. Elisseeff A, Weston J (2002) A kernel method for multi-labelled classification. Adv Neural Inf Process Syst 14: 681–687

    Google Scholar 

  7. Fei H, Huan J (2010) Boosting with structure information in the functional space: an application to graph classification. In: Proceedings of the 16th ACM SIGKDD conference on knowledge discovery and data mining. Washington, DC, pp 643–652

  8. Godbole S, Sarawagi S (2004) Discriminative methods for multi-labeled classification. In: Proceedings of the 8th Pacific-Asia conference on knowledge discovery and data mining. Sydney, Australia, pp 22–30

  9. Gretton A, Bousquet O, Smola A, Schölkopf B (2005) Measuring statistical dependence with Hilbert-Schmidt norms. ALT, Singapore, pp 63–77

    Google Scholar 

  10. Helma C, King R, Kramer S, Srinivasan A (2001) The predictive toxicology challenge 2000–2001. Bioinformatics 17(1): 107–108

    Article  Google Scholar 

  11. Huan J, Wang W, Prins J (2003) Efficient mining of frequent subgraph in the presence of isomorphism. In: Proceedings of the 3rd IEEE international conference on data mining. Melbourne, FL, pp 549–552

  12. Inokuchi A, Washio T, Motoda H (2000) An apriori-based algorithm for mining frequent substructures from graph data. In: Proceedings of the 4th European conference on principles of data mining and knowledge discovery. Lyon, France, pp 13–23

  13. Jia Y, Tao J, Huan J (2011) An efficient graph-mining method for complicated and noisy data with real-world applications. Knowl Inf Syst, pp 1–25

  14. Kashima H, Tsuda K, Inokuchi A (2003) Marginalized kernels between labeled graphs. In: Proceedings of the 20th international conference on machine learning. Washington, DC, pp 321–328

  15. Kazawa H, Izumitani T, Taira H, Maeda E (2005) Maximal margin labeling for multi-topic text categorization. Adv Neural Inf Process Syst 15: 649–656

    Google Scholar 

  16. Kong X, Yu P (2010) Semi-supervised feature selection for graph classification. In: Proceedings of the 16th ACM SIGKDD conference on knowledge discovery and data mining. Washington, DC, pp 793–802

  17. Kudo T, Maeda E, Matsumoto Y (2005) An application of boosting to graph classification. Adv Neural Inf Process Syst 15: 729–736

    Google Scholar 

  18. Kuramochi M, Karypis G (2001) Frequent subgraph discovery. In: Proceedings of the 1st IEEE international conference on data mining. San Jose, CA, pp 313–320

  19. McCallum A (1999) Multi-label text classification with a mixture model trained by EM. Working notes of the AAAI’99 Workshop on text learning, Orlando, FL

  20. Nijssen S, Kok J, (2004) A quickstart in frequent structure mining can make a difference. In: Proceedings of the 10th ACM SIGKDD conference on knowledge discovery and data mining. Seattle, WA, pp 647–652

  21. Schapire RE, Singer Y (2000) Boostexter: a boosting-based system for text categorization. Mach Learn 39(2–3): 135–168

    Article  MATH  Google Scholar 

  22. Tsoumakas G, Vlahavas I (2007) Random k-labelsets: an ensemble method for multilabel classification. In: Proceedings of the 18th European conference on machine learning. Warsaw, Poland, pp 406–417

  23. Tasourakakis C, Kang C, Faloutsos C (2010) Pegasus: mining peta-scale graphs. Knowl Inf Syst, pp 1–23

  24. Thoma M, Cheng H, Gretton A, Han J, Kriegel H, Smola A, Song L, Yu P, Yan X, Borgwardt K (2009) Near-optimal supervised feature selection among frequent subgraphs. In: Proceedings of the 9th SIAM international conference on data mining. Sparks, Nevada, pp 1075–1086

  25. Ueda N, Saito K (2003) Parametric mixture models for multi-labeled text. Adv Neural Inf Process Syst 13: 721–728

    Google Scholar 

  26. Yan X, Cheng H, Han J, Yu P (2008) Mining significant graph patterns by leap search. In: Proceedings of the ACM SIGMOD international conference on management of data. Vancouver, BC, pp 433–444

  27. Yan X, Han J (2002) gSpan: Graph-based substructure pattern mining. In: Proceedings of the 2nd IEEE international conference on data mining. Maebashi City, Japan, pp 721–724

  28. Ying X, Wu X (2010) On link privacy in randomizing social networks. Knowl Inf Syst, pp 1–19

  29. Zhang M-L, Zhou Z-H (2007) Ml-knn: a lazy learning approach to multi-label learning. Pattern Recognit 40(7): 2038–2048

    Article  MATH  Google Scholar 

  30. Zhang Y, Zhou Z-H (2008) Multi-label dimensionality reduction via dependency maximization. In: Proceedings of the 23rd AAAI conference on artificial intelligence. Chicago, IL, pp 1053–1055

  31. Zou Z, Gao H, Li J (2010) Discovering frequent subgraphs over uncertain graph databases under probabilistic semantics. In: Proceedings of the 16th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Washington, DC, pp 633–642

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiangnan Kong.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kong, X., Yu, P.S. gMLC: a multi-label feature selection framework for graph classification. Knowl Inf Syst 31, 281–305 (2012). https://doi.org/10.1007/s10115-011-0407-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-011-0407-3

Keywords