Interesting pattern mining in multi-relational data

Spyropoulou, Eirini; De Bie, Tijl; Boley, Mario

doi:10.1007/s10618-013-0319-9

Interesting pattern mining in multi-relational data

Published: 14 May 2013

Volume 28, pages 808–849, (2014)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Eirini Spyropoulou¹,
Tijl De Bie¹ &
Mario Boley²

1843 Accesses
Explore all metrics

Abstract

Mining patterns from multi-relational data is a problem attracting increasing interest within the data mining community. Traditional data mining approaches are typically developed for single-table databases, and are not directly applicable to multi-relational data. Nevertheless, multi-relational data is a more truthful and therefore often also a more powerful representation of reality. Mining patterns of a suitably expressive syntax directly from this representation, is thus a research problem of great importance. In this paper we introduce a novel approach to mining patterns in multi-relational data. We propose a new syntax for multi-relational patterns as complete connected subsets of database entities. We show how this pattern syntax is generally applicable to multi-relational data, while it reduces to well-known tiles “ Geerts et al. (Proceedings of Discovery Science, pp 278–289, 2004)” when the data is a simple binary or attribute-value table. We propose RMiner, a simple yet practically efficient divide and conquer algorithm to mine such patterns which is an instantiation of an algorithmic framework for efficiently enumerating all fixed points of a suitable closure operator “Boley et al. (Theor Comput Sci 411(3):691–700, 2010)”. We show how the interestingness of patterns of the proposed syntax can conveniently be quantified using a general framework for quantifying subjective interestingness of patterns “De Bie (Data Min Knowl Discov 23(3):407–446, 2011b)”. Finally, we illustrate the usefulness and the general applicability of our approach by discussing results on real-world and synthetic databases.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

P-N-RMiner: a generic framework for mining interesting structured relational patterns

Article 03 February 2016

Constrained pattern mining in the new era

Article 23 July 2015

Pattern Mining: Current Challenges and Opportunities

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

In contrast to some traditional fixpoint enumeration algorithms, as they are for instance used in the context of formal concept analysis, this divide and conquer approach does neither assume an underlying complete lattice nor that the fixpoint set is closed under intersection. This is important because the set system of CCSs is not necessarily closed under intersection (due to connectivity) and two MCCSs cannot be joined to a common supremum (due to completeness).
Strongly accessible set systems generalize greedoids such as, e.g., poset ideals [see Boley (2011, Sect. 3.5.2) and Korte and Lovász (1985)].
Please note that by entities and entity types here, we actually refer to our notion of the terms. The same notions are defined as objects and entities respectively in Nijssen et al. (2011).
Note that practically, the quadratic space complexity of RMiner results from multiplying a linear space complexity with the maximal search tree depth, which, as we will show in Sect. 7.3, is practically a small constant. Also, as we discussed in Sect. 3.5, the practical time delay of RMiner depends on the density of the data set and can be optimised in practice by taking particular implementation choices. Thus, even though the theoretical complexities of Makino and Uno (2004) and RMiner are comparable, RMiner probably scales better in practice.
See http://www.imdb.com/
See http://www.informatik.uni-trier.de/~ley/db/

References

Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: Proceedings of the 20th international conference on very large data bases (VLDB), pp 487–499
Angles R, Gutierrez C (2008) Survey of graph database models. ACM Comput Surv 40(1):1:1–1:39
Google Scholar
Birkhoff G (1967) Lattice theory. American Mathematical Society, Providence
MATH Google Scholar
Boley M (2011) The efficient discovery of interesting closed pattern collections. PhD thesis, University of Bonn, Bonn
Boley M, Horvath T, Poigné A, Wrobel S (2010) Listing closed sets of strongly accessible set systems with applications to data mining. Theor Comput Sci 411(3):691–700
Article MATH Google Scholar
Bron C, Kerbosch J (1973) Algorithm 457: finding all cliques of an undirected graph. Commun ACM 16(9):575–577
Article MATH Google Scholar
Burdick D, Calimlim M, Flannick J, Gehrke J, Yiu T (2005) Mafia: a maximal frequent itemset algorithm. IEEE Trans Knowl Data Eng 17(11):1490–1504
Article Google Scholar
Calders T, Goethals B (2007) Non-derivable itemset mining. Data Min Knowl Discov 14(1):171–206
Article MathSciNet Google Scholar
Cerf L, Besson J, Robardet C, Boulicaut JF (2009) Closed patterns meet n-ary relations. ACM Trans Knowl Discov Data 3(1):3:1–3:36
Google Scholar
Cover TM, Thomas JA (2005) Elements of information theory. Wiley, Hoboken
Book Google Scholar
De Bie T (2011a) An information theoretic framework for data mining. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining (KDD), pp 564–572
De Bie T (2011b) Maximum entropy models and subjective interestingness: an application to tiles in binary databases. Data Min Knowl Discov 23(3):407–446
Article MATH MathSciNet Google Scholar
De Bie T, Kontonasios KN, Spyropoulou E (2010) A framework for mining interesting pattern sets. In: SIGKDD explorations, pp 92–100
De Raedt L, Zimmermann A (2007) Constraint-based pattern set mining. In: Proceedings of the SIAM international conference on data mining (SDM), pp 237–248
Dehaspe L, Toivonen H (1999) Discovery of frequent datalog patterns. Data Min Knowl Discov 3:7–36
Article Google Scholar
Elmasri R, Navathe SB (2006) Fundamentals of database systems. Addison Wesley, Boston
Google Scholar
Garriga GC, Khardon R, De Raedt L (2007) On mining closed sets in multi-relational data. In: Proceedings of the 20th international joint conference on artifical intelligence (IJCAI), pp 804–809
Geerts F, Goethals B, Mielikainen T (2004) Tiling databases. In: Proceedings of discovery science, pp 278–289
Geng L, Hamilton HJ (2006) Interestingness measures for data mining: a survey. In: ACM computing surveys, vol 38. ACM, New York
Gionis A, Mannila H, Mielikinen T, Tsaparas P (2007) Assessing data mining results via swap randomization. ACM Trans Knowl Discov Data 1(3):14
Article Google Scholar
Goethals B, Le Page W (2008) Mining association rules of simple conjunctive queries. In: Proceedings of the SIAM international conference on data mining (SDM), Atlanta
Goethals B, Page WL, Mampaey M (2010) Mining interesting sets and rules in relational databases. In: Proceedings of the ACM symposium on applied computing (SAC), pp 997–1001
Gupta R, Fang G, Field B, Steinbach M, Kumar V (2008) Quantitative evaluation of approximate frequent pattern mining algorithms. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining (KDD), pp 301–309
Hanhijarvi S, Ojala M, Vuokko N, Puolamaki K, Tatti N, Mannila H (2009) Tell me something i don’t know: randomization strategies for iterative data mining. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining (KDD). ACM, New York, pp 379–388
Jäschke R, Hotho A, Schmitz C, Ganter B, Stumme G (2008) Discovering shared conceptualizations in folksonomies. Web Semant 6(1):38–53
Article Google Scholar
Jen TY, Laurent D, Spyratos N (2010) Computing supports of conjunctive queries on relational tables with functional dependencies. Fundam Inf 99(3):263–292
MATH MathSciNet Google Scholar
Ji M, Han J, Danilevsky M (2011) Ranking-based classification of heterogeneous information networks. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining (KDD), pp 1298–1306
Ji M, Sun Y, Danilevsky M, Han J, Gao J (2010) Graph regularized transductive classification on heterogeneous information networks. In: ECML/PKDD (1), pp 570–586
Ji L, Tan KL, Tung AKH (2006) Mining frequent closed cubes in 3d datasets. In: Proceedings of the international conference on very large data bases, VLDB endowment, VLDB, pp 811–822
Kontonasios K, Spyropoulou E, De Bie T (2012) Knowledge discovery interestingness measures based on unexpectedness. In: Wiley interdisciplinary reviews: data mining and knowledge discovery, pp 386–399
Koopman A, Siebes A (2008) Discovering relational item sets efficiently. In: Proceedings of the SIAM conference on data mining (SDM), pp 108–119
Koopman A, Siebes A (2009) Characteristic relational patterns. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining (KDD), pp 437–446
Korte B, Lovász L (1985) Relations between subclasses of greedoids. Math Methods Oper Res 29:249–267
Article Google Scholar
Kuramochi M, Karypis G (2001) Frequent subgraph discovery. In: Proceedings of the IEEE international conference on data mining (ICDM), pp 313–320
Lawler EL, Lenstra JK, Kan AHGR (1980) Generating all maximal independent sets: Np-hardness and polynomial-time algorithms. SIAM J Comput 9(3):558–565
Article MATH MathSciNet Google Scholar
Makino K, Uno T (2004) New algorithms for enumerating all maximal cliques. In: Scandinavia workshop on algorithm theory (SWAT), pp 260–272
Maruhashi K, Guo F, Faloutsos C (2011) Multiaspectforensics: Pattern mining on large-scale heterogeneous networks with tensor analysis. In: Proceedings of the international conference on advances in social networks analysis and mining, ASONAM ’11, pp 203–210
Ng EKK, Ng K, Fu AWC, Wang K (2002) Mining association rules from stars. In: Proceedings of the IEEE international conference on data mining (ICDM), pp 322–329
Nijssen S, Jiménez A, Guns T (2011) Constraint-based pattern mining in multi-relational databases. In: ICDM workshops, pp 1120–1127
Nijssen S, Kok J (2003) Efficient frequent query discovery in FARMER. In: Proceedings of the European conference on principles and practice of knowledge discovery in databases (PKDD), pp 350–362
Ojala M, Garriga GC, Gionis A, Mannila H (2010) Evaluating query result significance in databases via randomizations. In: Proceedings of the SIAM conference on data mining (SDM), pp 906–917
Pardalos PM, Xue J (1994) The maximum clique problem. J Glob Optim 4:301–328
Article MATH MathSciNet Google Scholar
Poernomo AK, Gopalkrishnan V (2009) Towards efficient mining of proportional fault-tolerant frequent itemsets. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining (KDD), pp 697–706
Siebes A, Vreeken J, van Leeuwen M (2006) Item sets that compress. In: Proceedings of the SIAM conference on data mining (SDM), pp 393–404
Spyropoulou E, De Bie T (2011) Interesting multi-relational patterns. In: Proceedings of the IEEE international conference on data mining (ICDM), pp 675–684
Srikant R, Agrawal R (1996) Mining quantitative association rules in large relational tables. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 1–12
Sun Y, Han J, Aggarwal CC, Chawla NV (2012a) When will it happen?: relationship prediction in heterogeneous information networks. In: Proceedings of the fifth ACM international conference on Web search and data mining, WSDM ’12, pp 663–672
Sun Y, Norick B, Han J, Yan X, Yu PS, Yu X (2012b) Integrating meta-path selection with user-guided object clustering in heterogeneous information networks. In: KDD, pp 1348–1356
Sun Y, Yu Y, Han J (2009) Ranking-based clustering of heterogeneous information networks with star network schema. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining (KDD), pp 797–806
Tang L, Wang X, Liu H (2012) Community detection via heterogeneous interaction analysis. Data Min Knowl Discov 25(1):1–33
MathSciNet Google Scholar
Trabelsi C, Jelassi N, Ben Yahia S (2012) Scalable mining of frequent tri-concepts from folksonomies. In: Advances in knowledge discovery and data mining, pp 231–242
Uno T, Asai T, Uchida Y, Arimura H (2004a) An efficient algorithm for enumerating closed patterns in transaction databases. In: Discovery science, pp 16–31
Uno T, Kiyomi M, Arimura H (2004b) Lcm ver. 2: Efficient mining algorithms for frequent/closed/maximal itemsets. In: Proceedings of the IEEE ICDM workshop on frequent itemset mining implementations (FIMI), Brighton
Voutsadakis G (2002) Polyadic concept analysis. Order 19(3):295–304
Article MATH MathSciNet Google Scholar
Yahia B, Hamrouni T, Nguifo EM (2006) Frequent closed itemset based algorithms: a thorough structural and analytical survey. SIGKDD Explor Newsl 8(1):93–104
Article Google Scholar
Yan X, Han J (2002) gspan: Graph-based substructure pattern mining. In: Proceedings of the IEEE international conference on data mining (ICDM), pp 721–730
Yan X, Han J (2003) Closegraph: mining closed frequent graph patterns. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining (KDD), pp 286–295
Zaki MJ (2000) Scalable algorithms for association mining. IEEE Trans Knowl Data Eng 12(3):372–390
Article MathSciNet Google Scholar
Zaki M, Hsiao CJ (2005) Efficient algorithms for mining closed itemsets and their lattice structure. IEEE Trans Knowl Data Eng 17(4):462–478
Article Google Scholar
Zaki MJ, Peters M, Assent I, Seidl T (2007) Clicks: an effective algorithm for mining subspace clusters in categorical datasets. Data Knowl Eng 60(1):51–70
Article Google Scholar
Zaki M, Hsiao CJ (2002) CHARM: an efficient algorithm for closed itemset mining. In: Proceedings of the SIAM international conference on data mining (SDM), pp 457–473
Zaki M, Ogihara M (1998) Theoretical foundations of association rules. In: Proceedings of the ACM SIGMOD workshop on research issues in data mining and knowledge discovery, San Diego

Download references

Acknowledgments

We are grateful to Michael Mampaey for providing the Smurfig code and data and for his support in using Smurfig, Siegfried Nijssen for his assistance in using Farmer and Thomas Gärtner for discussions on this work. This work was partially funded by PASCAL 2 Network of Excellence. Eirini Spyropoulou and Tijl De Bie are supported by EPSRC Grant EP/G056447/1. Mario Boley is partially funded by DFG (German National Research Foundation) under GA 1615/2-1.

Author information

Authors and Affiliations

Intelligent Systems Laboratory, University of Bristol, Woodland Road, Bristol, UK
Eirini Spyropoulou & Tijl De Bie
Fraunhofer IAIS, Schloss Birlinghoven, 53754 , Sankt Augustin, Germany
Mario Boley

Authors

Eirini Spyropoulou
View author publications
You can also search for this author inPubMed Google Scholar
Tijl De Bie
View author publications
You can also search for this author inPubMed Google Scholar
Mario Boley
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Eirini Spyropoulou.

Additional information

Responsible editor: M.J. Zaki.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Spyropoulou, E., De Bie, T. & Boley, M. Interesting pattern mining in multi-relational data. Data Min Knowl Disc 28, 808–849 (2014). https://doi.org/10.1007/s10618-013-0319-9

Download citation

Received: 08 August 2012
Accepted: 25 April 2013
Published: 14 May 2013
Issue Date: May 2014
DOI: https://doi.org/10.1007/s10618-013-0319-9

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Interesting pattern mining in multi-relational data

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

P-N-RMiner: a generic framework for mining interesting structured relational patterns

Constrained pattern mining in the new era

Pattern Mining: Current Challenges and Opportunities

Explore related subjects

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now