Abstract
A pattern is considered useful if it can be used to help a person to achieve his goal. Mining data streams for useful patterns is important in many applications. However, data stream can change their behavior over time and, when significant change occurs, much harm is done to the mining result if it is not properly handled. In the past, there have been many studies mainly on adapting to changes in data streams. We contend that adapting to changes is simply not enough. The ability to detect and characterize change is also essential in many applications, for example intrusion detection, network traffic analysis, data streams from intensive care units etc. Detecting changes is nontrivial. In this paper, an online algorithm for change detection in utility mining is proposed. In order to provide a mechanism for making quantitative description of the detected change, we adopt the statistical test. We believe there is the opportunity for an immensely rewarding synergy between data mining and statistic. Different statistical significance tests are evaluated and our study shows that the Chi-square test is the most suitable for enumerated or count data (as is the case for high utility itemsets). We demonstrate the effectiveness of the proposed method by testing it on IBM QUEST market-basket data.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Agrawal, R., Imielinski, T., Swami, A.: Fast alogorithms for association rules. In: Int. Conf. Very large Data Bases, pp. 487–499 (1994)
Yao, H., Hamilton, H.J., Butz, C.J.: A Foundational Approach to Mining Itemsets Utilities from Databases. In: Proc.of the 4th SIAM Int. Conf. on Data Mining, Florida, USA (2004)
Liu, Y., Liao, W.K., Choudhary, A.: A two phase algorithm for fast discovery of high utility itemsets. In: Ho, T.-B., Cheung, D., Liu, H. (eds.) PAKDD 2005. LNCS (LNAI), vol. 3518, pp. 689–695. Springer, Heidelberg (2005)
Tseng, V.S., Chu, C.J., Liang, T.: Efficient Mining of Temporal High Utility Itemsets from Data streams. In: UBDM 2006 (2006)
Cai, C.H., Fu, A.W., Cheng, C.H., Kwong, W.W.: Mining Association Rules with Weighted Items. In: Proceedings of International Database Engineering and Applications Symposium, IDEAS 1998 (1998)
Tao, F., Murtagh, F., Farid, M.: Weighted Association Rule Mining using weighted support and significance framework. In: KDD 2003, pp. 661–666 (2003)
Barber, B., Hamilton, H.J.: Extracting Share Frequent Itemsets with Infrequent Subsets. Data Min. Knowl. Discov. 7(2), 153–185 (2003)
Chan, R., Yang, Q., Shen, Y.D.: Mining High Utility Itemsets. In: IEEE ICDM 2003, pp. 19–26 (2003)
Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. In: PODS (2002)
Jiang, N., Gruenwald, L.: Research Issues in Data Stream Association Rule Mining. SIGMOD Record 35(1), 14–19 (2006)
Domingos, P., Hulten, G.: A general Framework for mining massive data streams. Journal of Computational and Graphical Statistics 12 (2003)
Kifer, D., Ben-David, S., Gehrke, J.: Detecting change in data streams. In: VLDB, pp. 180–191 (2004)
Manku, G.S., Motwani, R.: Approximate frequency counts over data streams. In: Bressan, S., Chaudhri, A.B., Li Lee, M., Yu, J.X., Lacroix, Z. (eds.) CAiSE 2002 and VLDB 2002. LNCS, vol. 2590, pp. 346–357. Springer, Heidelberg (2003)
Johnson, T., Muthukrishnan, S., Rozenbaum, I.: Sampling algorithms in a stream operator. In: SIGMOD Conference (2005)
Olken, F., Rotem, D.: Random Sampling from database files - a survey. In: 5th Intl.Conf. Statistical and Scientific Database Management (April 1990)
Cochran, W.G.: Sampling techniques. John Wiley & Sons, Chichester (1977)
Toivonen, H.: Sampling large databases for association rules. In: VLDB, pp. 134–145 (1996)
Parthasarathy, S.: Efficient Progressive Sampling for Association Rules. In: IEEE ICDM, pp. 354–361 (2002)
Zaki, M., Parthasarathy, S., Li, W., Ogihara, M.: Evaluation of sampling for data mining of association rules. In: 7th International Workshop on Research Issues in Data Engineering (1996)
Dash, M., Ng, W.: Efficient Reservoir Sampling for Transactional Data Streams. In: IEEE ICDM workshop on Mining Evolving and Streaming Data, Hong Kong (2006)
Manku, G.S., Motwani, R.: Approximate frequency counts over data streams. In: Proc. of 28th Int’l Conf. on Very large Databases, Hong Kong (August 2002)
Domingos, C., Gavaldà , R., Watanabe, O.: Practical algorithms for on-line sampling. Discovery Science, 150–161 (1998)
Yu, X., Chong, Z., Lu, H., Zhou, A.: False positive of false negative: Mining frequent itemsets from high speed transactional data streams. In: Int. Conf on VLDB (2004)
Babcock, B., Datar, M., Motwani, R.: Sampling from a moving window over streaming data. In: Proc. SODA (2002)
Cohen, E., Strauss, M.: Maintaining time-decaying stream aggregates. In: PODS, pp. 223–233 (2003)
Yi, B.K., Sidiropoulos, N., Johnoson, T., Jagadish, H.V., Faloutsos, C., Biliris, A.: Online mining for co-evolving time sequences. In: ICDE, pp. 13–22 (2000)
Giannella, C., Han, J., Pei, J., Yan, X., Yu, P.S.: Mining Frequent Patterns in Data Streams at Multiple Time Granularities. In: Kargupta, H., Joshi, A., Sivakumar, K., Yesha, Y. (eds.) Next Generation Data Mining, pp. 191–212. AAAI/MIT (2003)
Chen, L., Lee, W.: Finding recent frequent itemsets adaptively over online data streams. In: Proc. of ACM SIGKDD Cof., pp. 487–492 (2003)
Chi, Y., Wang, H., Yu, P.S., Muntz, R.R.: Moment: Maintaining closed frequent itemsets over a stream sliding window. In: 4th IEEE ICDM (2004)
Lin, C., Chiu, D., Wu, Y., Chen, A.L.P.: Mining frequent itemsets from data streams with a time-sensitive sliding window. Siam Data Mining (2005)
Lee, S.D., Cheung, D.W.: Maintenance of Discovered Association Rules: When to update? In: Proc. 1997 ACM-SIGMOD Workshop on Data Mining and Knowledge Discovery (DMKD 1997), Tucson, Arizona (May 1997)
Vitter, J.: Random sampling with a reservoir. ACM Transactions on Mathematical Software 11(1), 37–57 (1985)
Pagano, M., Gauvreau, K.: Principles of biostatistics, Duxbury, Thomsom Learning,USA (2000)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ng, W., Dash, M. (2008). A Test Paradigm for Detecting Changes in Transactional Data Streams. In: Haritsa, J.R., Kotagiri, R., Pudi, V. (eds) Database Systems for Advanced Applications. DASFAA 2008. Lecture Notes in Computer Science, vol 4947. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78568-2_17
Download citation
DOI: https://doi.org/10.1007/978-3-540-78568-2_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-78567-5
Online ISBN: 978-3-540-78568-2
eBook Packages: Computer ScienceComputer Science (R0)