Overlapping community detection in labeled graphs

Galbrun, Esther; Gionis, Aristides; Tatti, Nikolaj

doi:10.1007/s10618-014-0373-y

Overlapping community detection in labeled graphs

Published: 02 August 2014

Volume 28, pages 1586–1610, (2014)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Esther Galbrun¹,
Aristides Gionis² &
Nikolaj Tatti²

1317 Accesses
Explore all metrics

Abstract

We present a new approach for the problem of finding overlapping communities in graphs and social networks. Our approach consists of a novel problem definition and three accompanying algorithms. We are particularly interested in graphs that have labels on their vertices, although our methods are also applicable to graphs with no labels. Our goal is to find k communities so that the total edge density over all k communities is maximized. In the case of labeled graphs, we require that each community is succinctly described by a set of labels. This requirement provides a better understanding for the discovered communities. The proposed problem formulation leads to the discovery of vertex-overlapping and dense communities that cover as many graph edges as possible. We capture these properties with a simple objective function, which we solve by adapting efficient approximation algorithms for the generalized maximum-coverage problem and the densest-subgraph problem. Our proposed algorithm is a generic greedy scheme. We experiment with three variants of the scheme, obtained by varying the greedy step of finding a dense subgraph. We validate our algorithms by comparing with other state-of-the-art community-detection methods on a variety of performance measures. Our experiments confirm that our algorithms achieve results of high quality in terms of the reported measures, and are practical in terms of performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Overlapping Community Detection in Weighted Graphs: Matrix Factorization Approach

SAT-based models for overlapping community detection in networks

Article 23 March 2020

Community Detection in Edge-Labeled Graphs

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

Cohen and Katzir express their approximation factor as $(\frac{2e-1}{e-1}+\epsilon )$, for every $\epsilon >0$, but we follow the convention that maximization problems have approximation factors less than 1.
http://dblp.uni-trier.de/xml/
Namely, S. Abiteboul, E. Demaine, M. Ester, C. Faloutsos, J. Han, G. Karypis, J. Kleinberg, H. Mannila, K. Mehlhorn, C. Papadimitriou, B. Shneiderman, G. Weikum and P. Yu.
http://snap.stanford.edu
http://www.lastfm.com
http://grouplens.org/datasets/hetrec-2011/
http://www.cs.helsinki.fi/u/galbrun/misc/lic/

References

Ahn YY, Bagrow JP, Lehmann S (2010) Link communities reveal multiscale complexity in networks. Nature 466:761–764
Article Google Scholar
Asahiro Y, Iwama K, Tamaki H, Tokuyama T (2000) Greedily finding a dense subgraph. J Algorithms 34(2):203–221
Article MATH MathSciNet Google Scholar
Atkins JE, Boman EG, Hendrickson B (1998) A spectral algorithm for seriation and the consecutive ones problem. SIAM J Comput 28:297–310
Article MATH MathSciNet Google Scholar
Balasubramanyan R, Cohen WW (2011) Block-LDA: Jointly modeling entity-annotated text and entity-entity links. In: SIAM international conference on data mining (SDM’11), SIAM/Omnipress, pp 450–461
Charikar M (2000) Greedy approximation algorithms for finding dense components in a graph. In: International workshop on approximation, randomization, and combinatorial optimization (APPROX’00) pp 84–95
Chen W, Liu Z, Sun X, Wang Y (2010) A game-theoretic framework to identify overlapping communities in social networks. Data Min Knowl Discov 21(2):224–240
Article MathSciNet Google Scholar
Clauset A, Newman MEJ, Moore C (2004) Finding community structure in very large networks. Phys Rev E 70:066111
Article Google Scholar
Cohen R, Katzir L (2008) The generalized maximum coverage problem. Inf Process Lett 108:15–22
Article MATH MathSciNet Google Scholar
Coscia M, Rossetti G, Giannotti F, Pedreschi D (2012) DEMON: a local-first discovery method for overlapping communities. In: Yang Q, Agarwal D, Pei J (eds) ACM SIGKDD international conference on knowledge discovery and data mining (KDD’12), pp 615–623
Flake GW, Lawrence S, Giles CL (2000) Efficient identification of web communities. In: Ramakrishnan R, Stolfo SJ, Bayardo RJ, Parsa I (eds) ACM SIGKDD international conference on knowledge discovery and data mining (KDD’00), ACM, pp 150–160
Fortunato S (2010) Community detection in graphs. Physics Reports 486
Girvan M, Newman MEJ (2002) Community structure in social and biological networks. Proc Nat Acad Sci USA 99:7821–7826
Article MATH MathSciNet Google Scholar
Gregory S (2007) An algorithm to find overlapping community structure in networks. In: Kok JN, Koronacki J, de Mántaras RL, Matwin S, Mladenic D, Skowron A (eds) European conference on principles and practice of knowledge discovery in databases (ECML/PKDD’07). Lecture Notes in Computer Science, vol 4702, pp 91–102. Springer, Berlin
Gupta R, Roughgarden T, Seshadhri C (2014) Decompositions of triangle-dense graphs. In: Naor M (ed) Innovations in theoretical computer science. (ITCS’14), ACM, pp 471–482
Karypis G, Kumar V (1998) Multilevel algorithms for multi-constraint graph partitioning. In: ACM/IEEE conference on supercomputing (SC ’98), IEEE Computer Society, pp 1–13
McAuley J, Leskovec J (2012) Learning to discover social circles in ego networks. In: Bartlett PL, Pereira FCN, Burges CJC, Bottou L, Weinberger KQ (eds) Advances in neural information processing systems (NIPS’12), pp 548–556
Miettinen P, Mielikäinen T, Gionis A, Das G, Mannila H (2008) The discrete basis problem. IEEE Trans Knowl Data Eng 20(10):1348–1362
Ng AY, Jordan MI, Weiss Y (2001) On spectral clustering: Analysis and an algorithm. In: Shawe-Taylor J, Zemel RS, Bartlett PL, Pereira FCN, Weinberger KQ (eds) Advances in neural information processing systems (NIPS’01), pp 849–856
Palla G, Derényi I, Farkas I, Vicsek T (2005) Uncovering the overlapping community structure of complex networks in nature and society. Nature 435:814–818
Article Google Scholar
Pinney J, Westhead D (2006) Betweenness-based decomposition methods for social and biological networks. In: Interdisciplinary statistics and bioinformatics, pp 87–90
Pons P, Latapy M (2006) Computing communities in large networks using random walks. J Graph Algorithms Appl 10(2):284–293
Article MathSciNet Google Scholar
Pool S, Bonchi F, van Leeuwen M (2014) Description-driven community detection. ACM Trans Intell Syst Technol 5(2):1–28
Article Google Scholar
van Dongen S (2000) Graph clustering by flow simulation. Ph.D. Thesis, University of Utrecht
von Luxburg U (2007) A tutorial on spectral clustering. Stat Comput 17(4):395–416
Article MathSciNet Google Scholar
White S, Smyth P (2005) A spectral clustering approach to finding communities in graph. In: SIAM international conference on data mining (SDM’05), SIAM/Omnipress, pp 76–84
Xie J, Kelley S, Szymanski BK (2011) Overlapping community detection in networks: the state of the art and comparative study. arxivorg/abs/11105813
Yan B, Gregory S (2009) Detecting communities in networks by merging cliques. In: IEEE international conference on intelligent computing and intelligent systems (ICIS’09), pp 832–836
Yang J, Leskovec J (2013) Overlapping community detection at scale: a nonnegative matrix factorization approach. In: Leonardi S, Panconesi A, Ferragina P, Gionis A (eds) ACM international conference on web search and data mining (WSDM’13), ACM, pp 587–596
Zhou H, Lipowsky R (2004) Network Brownian motion: A new method to measure vertex–vertex proximity and to identify communities and subcommunities. In: Bubak M, Albada G, Sloot P, Dongarra J (eds) Computational science (ICCS’04). Lecture Notes in Computer Science, vol 3038, pp 1062–1069

Download references

Author information

Authors and Affiliations

Department of Computer Science, Boston University, Boston, MA, USA
Esther Galbrun
Helsinki Institute for Information Technology (HIIT) and Department of Information and Computer Science, Aalto University, Espoo, Finland
Aristides Gionis & Nikolaj Tatti

Authors

Esther Galbrun
View author publications
You can also search for this author inPubMed Google Scholar
Aristides Gionis
View author publications
You can also search for this author inPubMed Google Scholar
Nikolaj Tatti
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Esther Galbrun.

Additional information

Responsible editor: Toon Calders, Floriana Esposito, Eyke Hüllermeier, Rosa Meo.

Appendix: Residual dense subgraph is NP-hard

Let us first define the problem of discovering a graph with high residual density.

Problem 5

(ResDenseGraph) Let $G = (V, E, w)$ be a graph with weighted edges. Find a subgraph $H = (X, R)$ such that

$$\begin{aligned} d \mathopen {}\left( H\right) - \sum _{e \in R} w(e) \end{aligned}$$

is maximized.

Proposition 1

ResDenseGraph is NP-hard.

Proof

We will prove hardness by reducing the clique problem. Assume that we are given a graph $G = (V, E)$ and a size of a clique k. Define the weights to be $w(e) = 2 / (2K - 1)$.

Let us assume that $G$ contains a clique of size k, say $H = (X, R)$. We will first show that $H$ has the highest density. To see this let $H' = (X', R')$. Let $N = {\left| X'\right| }$. If $N < K$, then the profit of $H'$ is genuinely smaller than the profit of $H$. If $N = K$, then the profit of $H$ is larger or equal to the profit of $H'$. If the profits are equal, then $H'$ has to be a clique as well. Assume that $N > K$. Then we can upper-bound the profit by

$$\begin{aligned} (N - 1) - N(N - 1)/(2K + 1) = -\frac{(N - 1)(N - 2K + 1)}{2K - 1}. \end{aligned}$$

This bound is a parabola, obtaining its apex at k. This shows that the profit of $H'$ is genuinely lower than the profit of $H$.

We have shown that $G = (V, E)$ has a k-clique if and only if the optimal answer for ResDenseGraph is a clique of size k.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Galbrun, E., Gionis, A. & Tatti, N. Overlapping community detection in labeled graphs. Data Min Knowl Disc 28, 1586–1610 (2014). https://doi.org/10.1007/s10618-014-0373-y

Download citation

Received: 16 March 2014
Accepted: 24 June 2014
Published: 02 August 2014
Issue Date: September 2014
DOI: https://doi.org/10.1007/s10618-014-0373-y

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Overlapping community detection in labeled graphs

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Overlapping Community Detection in Weighted Graphs: Matrix Factorization Approach

SAT-based models for overlapping community detection in networks

Community Detection in Edge-Labeled Graphs

Explore related subjects

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix: Residual dense subgraph is NP-hard

Appendix: Residual dense subgraph is NP-hard

Problem 5

Proposition 1

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now