gMLC: a multi-label feature selection framework for graph classification

Kong, Xiangnan; Yu, Philip S.

doi:10.1007/s10115-011-0407-3

gMLC: a multi-label feature selection framework for graph classification

Regular Paper
Published: 08 May 2011

Volume 31, pages 281–305, (2012)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Xiangnan Kong¹ &
Philip S. Yu¹

596 Accesses
56 Citations
Explore all metrics

Abstract

Graph classification has been showing critical importance in a wide variety of applications, e.g. drug activity predictions and toxicology analysis. Current research on graph classification focuses on single-label settings. However, in many applications, each graph data can be assigned with a set of multiple labels simultaneously. Extracting good features using multiple labels of the graphs becomes an important step before graph classification. In this paper, we study the problem of multi-label feature selection for graph classification and propose a novel solution, called gMLC, to efficiently search for optimal subgraph features for graph objects with multiple labels. Different from existing feature selection methods in vector spaces that assume the feature set is given, we perform multi-label feature selection for graph data in a progressive way together with the subgraph feature mining process. We derive an evaluation criterion to estimate the dependence between subgraph features and multiple labels of graphs. Then, a branch-and-bound algorithm is proposed to efficiently search for optimal subgraph features by judiciously pruning the subgraph search space using multiple labels. Empirical studies demonstrate that our feature selection approach can effectively boost multi-label graph classification performances and is more efficient by pruning the subgraph search space using multiple labels.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Borgelt C, Berthold M (2002) Mining molecular fragments: Finding relevant substructures of molecules. In: Proceedings of the 2nd IEEE international conference on data mining. Maebashi City, Japan, pp 211–218
Borgwardt KM (2007) Graph Kernels. PhD thesis, Ludwig-Maximilians-University Munich
Boutell MR, Luo J, Shen X, Brown CM (2004) Learning multi-label scene classification. Pattern Recognit 37(9): 1757–1771
Article Google Scholar
Chen C, Yan X, Zhu F, Han J, Yu P (2009) Graph OLAP: a multi-dimensional framework for graph data analysis. Knowl Inf Syst 21(1): 41–63
Article Google Scholar
Comité FD, Gilleron R, Tommasi M (2003) Learning multi-label alternating decision tree from texts and data. In: Proceedings of the 3rd international conference on machine learning and data mining in pattern recognition. Leipzig, Germany, pp 35–49
Elisseeff A, Weston J (2002) A kernel method for multi-labelled classification. Adv Neural Inf Process Syst 14: 681–687
Google Scholar
Fei H, Huan J (2010) Boosting with structure information in the functional space: an application to graph classification. In: Proceedings of the 16th ACM SIGKDD conference on knowledge discovery and data mining. Washington, DC, pp 643–652
Godbole S, Sarawagi S (2004) Discriminative methods for multi-labeled classification. In: Proceedings of the 8th Pacific-Asia conference on knowledge discovery and data mining. Sydney, Australia, pp 22–30
Gretton A, Bousquet O, Smola A, Schölkopf B (2005) Measuring statistical dependence with Hilbert-Schmidt norms. ALT, Singapore, pp 63–77
Google Scholar
Helma C, King R, Kramer S, Srinivasan A (2001) The predictive toxicology challenge 2000–2001. Bioinformatics 17(1): 107–108
Article Google Scholar
Huan J, Wang W, Prins J (2003) Efficient mining of frequent subgraph in the presence of isomorphism. In: Proceedings of the 3rd IEEE international conference on data mining. Melbourne, FL, pp 549–552
Inokuchi A, Washio T, Motoda H (2000) An apriori-based algorithm for mining frequent substructures from graph data. In: Proceedings of the 4th European conference on principles of data mining and knowledge discovery. Lyon, France, pp 13–23
Jia Y, Tao J, Huan J (2011) An efficient graph-mining method for complicated and noisy data with real-world applications. Knowl Inf Syst, pp 1–25
Kashima H, Tsuda K, Inokuchi A (2003) Marginalized kernels between labeled graphs. In: Proceedings of the 20th international conference on machine learning. Washington, DC, pp 321–328
Kazawa H, Izumitani T, Taira H, Maeda E (2005) Maximal margin labeling for multi-topic text categorization. Adv Neural Inf Process Syst 15: 649–656
Google Scholar
Kong X, Yu P (2010) Semi-supervised feature selection for graph classification. In: Proceedings of the 16th ACM SIGKDD conference on knowledge discovery and data mining. Washington, DC, pp 793–802
Kudo T, Maeda E, Matsumoto Y (2005) An application of boosting to graph classification. Adv Neural Inf Process Syst 15: 729–736
Google Scholar
Kuramochi M, Karypis G (2001) Frequent subgraph discovery. In: Proceedings of the 1st IEEE international conference on data mining. San Jose, CA, pp 313–320
McCallum A (1999) Multi-label text classification with a mixture model trained by EM. Working notes of the AAAI’99 Workshop on text learning, Orlando, FL
Nijssen S, Kok J, (2004) A quickstart in frequent structure mining can make a difference. In: Proceedings of the 10th ACM SIGKDD conference on knowledge discovery and data mining. Seattle, WA, pp 647–652
Schapire RE, Singer Y (2000) Boostexter: a boosting-based system for text categorization. Mach Learn 39(2–3): 135–168
Article MATH Google Scholar
Tsoumakas G, Vlahavas I (2007) Random k-labelsets: an ensemble method for multilabel classification. In: Proceedings of the 18th European conference on machine learning. Warsaw, Poland, pp 406–417
Tasourakakis C, Kang C, Faloutsos C (2010) Pegasus: mining peta-scale graphs. Knowl Inf Syst, pp 1–23
Thoma M, Cheng H, Gretton A, Han J, Kriegel H, Smola A, Song L, Yu P, Yan X, Borgwardt K (2009) Near-optimal supervised feature selection among frequent subgraphs. In: Proceedings of the 9th SIAM international conference on data mining. Sparks, Nevada, pp 1075–1086
Ueda N, Saito K (2003) Parametric mixture models for multi-labeled text. Adv Neural Inf Process Syst 13: 721–728
Google Scholar
Yan X, Cheng H, Han J, Yu P (2008) Mining significant graph patterns by leap search. In: Proceedings of the ACM SIGMOD international conference on management of data. Vancouver, BC, pp 433–444
Yan X, Han J (2002) gSpan: Graph-based substructure pattern mining. In: Proceedings of the 2nd IEEE international conference on data mining. Maebashi City, Japan, pp 721–724
Ying X, Wu X (2010) On link privacy in randomizing social networks. Knowl Inf Syst, pp 1–19
Zhang M-L, Zhou Z-H (2007) Ml-knn: a lazy learning approach to multi-label learning. Pattern Recognit 40(7): 2038–2048
Article MATH Google Scholar
Zhang Y, Zhou Z-H (2008) Multi-label dimensionality reduction via dependency maximization. In: Proceedings of the 23rd AAAI conference on artificial intelligence. Chicago, IL, pp 1053–1055
Zou Z, Gao H, Li J (2010) Discovering frequent subgraphs over uncertain graph databases under probabilistic semantics. In: Proceedings of the 16th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Washington, DC, pp 633–642

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Illinois at Chicago, Chicago, IL, USA
Xiangnan Kong & Philip S. Yu

Authors

Xiangnan Kong
View author publications
You can also search for this author inPubMed Google Scholar
Philip S. Yu
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Xiangnan Kong.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kong, X., Yu, P.S. gMLC: a multi-label feature selection framework for graph classification. Knowl Inf Syst 31, 281–305 (2012). https://doi.org/10.1007/s10115-011-0407-3

Download citation

Received: 06 January 2011
Revised: 04 April 2011
Accepted: 26 April 2011
Published: 08 May 2011
Issue Date: May 2012
DOI: https://doi.org/10.1007/s10115-011-0407-3

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

gMLC: a multi-label feature selection framework for graph classification

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Subgraph Augmentation with Application to Graph Mining

Multi-graph-view subgraph mining for graph classification

Representing Graphs as Bag of Vertices and Partitions for Graph Classification

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

gMLC: a multi-label feature selection framework for graph classification

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Subgraph Augmentation with Application to Graph Mining

Multi-graph-view subgraph mining for graph classification

Representing Graphs as Bag of Vertices and Partitions for Graph Classification

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now