Skip to main content

A Test Paradigm for Detecting Changes in Transactional Data Streams

  • Conference paper
Database Systems for Advanced Applications (DASFAA 2008)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4947))

Included in the following conference series:

Abstract

A pattern is considered useful if it can be used to help a person to achieve his goal. Mining data streams for useful patterns is important in many applications. However, data stream can change their behavior over time and, when significant change occurs, much harm is done to the mining result if it is not properly handled. In the past, there have been many studies mainly on adapting to changes in data streams. We contend that adapting to changes is simply not enough. The ability to detect and characterize change is also essential in many applications, for example intrusion detection, network traffic analysis, data streams from intensive care units etc. Detecting changes is nontrivial. In this paper, an online algorithm for change detection in utility mining is proposed. In order to provide a mechanism for making quantitative description of the detected change, we adopt the statistical test. We believe there is the opportunity for an immensely rewarding synergy between data mining and statistic. Different statistical significance tests are evaluated and our study shows that the Chi-square test is the most suitable for enumerated or count data (as is the case for high utility itemsets). We demonstrate the effectiveness of the proposed method by testing it on IBM QUEST market-basket data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Agrawal, R., Imielinski, T., Swami, A.: Fast alogorithms for association rules. In: Int. Conf. Very large Data Bases, pp. 487–499 (1994)

    Google Scholar 

  2. Yao, H., Hamilton, H.J., Butz, C.J.: A Foundational Approach to Mining Itemsets Utilities from Databases. In: Proc.of the 4th SIAM Int. Conf. on Data Mining, Florida, USA (2004)

    Google Scholar 

  3. Liu, Y., Liao, W.K., Choudhary, A.: A two phase algorithm for fast discovery of high utility itemsets. In: Ho, T.-B., Cheung, D., Liu, H. (eds.) PAKDD 2005. LNCS (LNAI), vol. 3518, pp. 689–695. Springer, Heidelberg (2005)

    Google Scholar 

  4. Tseng, V.S., Chu, C.J., Liang, T.: Efficient Mining of Temporal High Utility Itemsets from Data streams. In: UBDM 2006 (2006)

    Google Scholar 

  5. Cai, C.H., Fu, A.W., Cheng, C.H., Kwong, W.W.: Mining Association Rules with Weighted Items. In: Proceedings of International Database Engineering and Applications Symposium, IDEAS 1998 (1998)

    Google Scholar 

  6. Tao, F., Murtagh, F., Farid, M.: Weighted Association Rule Mining using weighted support and significance framework. In: KDD 2003, pp. 661–666 (2003)

    Google Scholar 

  7. Barber, B., Hamilton, H.J.: Extracting Share Frequent Itemsets with Infrequent Subsets. Data Min. Knowl. Discov. 7(2), 153–185 (2003)

    Article  MathSciNet  Google Scholar 

  8. Chan, R., Yang, Q., Shen, Y.D.: Mining High Utility Itemsets. In: IEEE ICDM 2003, pp. 19–26 (2003)

    Google Scholar 

  9. Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. In: PODS (2002)

    Google Scholar 

  10. Jiang, N., Gruenwald, L.: Research Issues in Data Stream Association Rule Mining. SIGMOD Record 35(1), 14–19 (2006)

    Article  Google Scholar 

  11. Domingos, P., Hulten, G.: A general Framework for mining massive data streams. Journal of Computational and Graphical Statistics 12 (2003)

    Google Scholar 

  12. Kifer, D., Ben-David, S., Gehrke, J.: Detecting change in data streams. In: VLDB, pp. 180–191 (2004)

    Google Scholar 

  13. Manku, G.S., Motwani, R.: Approximate frequency counts over data streams. In: Bressan, S., Chaudhri, A.B., Li Lee, M., Yu, J.X., Lacroix, Z. (eds.) CAiSE 2002 and VLDB 2002. LNCS, vol. 2590, pp. 346–357. Springer, Heidelberg (2003)

    Google Scholar 

  14. Johnson, T., Muthukrishnan, S., Rozenbaum, I.: Sampling algorithms in a stream operator. In: SIGMOD Conference (2005)

    Google Scholar 

  15. Olken, F., Rotem, D.: Random Sampling from database files - a survey. In: 5th Intl.Conf. Statistical and Scientific Database Management (April 1990)

    Google Scholar 

  16. Cochran, W.G.: Sampling techniques. John Wiley & Sons, Chichester (1977)

    MATH  Google Scholar 

  17. Toivonen, H.: Sampling large databases for association rules. In: VLDB, pp. 134–145 (1996)

    Google Scholar 

  18. Parthasarathy, S.: Efficient Progressive Sampling for Association Rules. In: IEEE ICDM, pp. 354–361 (2002)

    Google Scholar 

  19. Zaki, M., Parthasarathy, S., Li, W., Ogihara, M.: Evaluation of sampling for data mining of association rules. In: 7th International Workshop on Research Issues in Data Engineering (1996)

    Google Scholar 

  20. Dash, M., Ng, W.: Efficient Reservoir Sampling for Transactional Data Streams. In: IEEE ICDM workshop on Mining Evolving and Streaming Data, Hong Kong (2006)

    Google Scholar 

  21. Manku, G.S., Motwani, R.: Approximate frequency counts over data streams. In: Proc. of 28th Int’l Conf. on Very large Databases, Hong Kong (August 2002)

    Google Scholar 

  22. Domingos, C., Gavaldà, R., Watanabe, O.: Practical algorithms for on-line sampling. Discovery Science, 150–161 (1998)

    Google Scholar 

  23. Yu, X., Chong, Z., Lu, H., Zhou, A.: False positive of false negative: Mining frequent itemsets from high speed transactional data streams. In: Int. Conf on VLDB (2004)

    Google Scholar 

  24. Babcock, B., Datar, M., Motwani, R.: Sampling from a moving window over streaming data. In: Proc. SODA (2002)

    Google Scholar 

  25. Cohen, E., Strauss, M.: Maintaining time-decaying stream aggregates. In: PODS, pp. 223–233 (2003)

    Google Scholar 

  26. Yi, B.K., Sidiropoulos, N., Johnoson, T., Jagadish, H.V., Faloutsos, C., Biliris, A.: Online mining for co-evolving time sequences. In: ICDE, pp. 13–22 (2000)

    Google Scholar 

  27. Giannella, C., Han, J., Pei, J., Yan, X., Yu, P.S.: Mining Frequent Patterns in Data Streams at Multiple Time Granularities. In: Kargupta, H., Joshi, A., Sivakumar, K., Yesha, Y. (eds.) Next Generation Data Mining, pp. 191–212. AAAI/MIT (2003)

    Google Scholar 

  28. Chen, L., Lee, W.: Finding recent frequent itemsets adaptively over online data streams. In: Proc. of ACM SIGKDD Cof., pp. 487–492 (2003)

    Google Scholar 

  29. Chi, Y., Wang, H., Yu, P.S., Muntz, R.R.: Moment: Maintaining closed frequent itemsets over a stream sliding window. In: 4th IEEE ICDM (2004)

    Google Scholar 

  30. Lin, C., Chiu, D., Wu, Y., Chen, A.L.P.: Mining frequent itemsets from data streams with a time-sensitive sliding window. Siam Data Mining (2005)

    Google Scholar 

  31. Lee, S.D., Cheung, D.W.: Maintenance of Discovered Association Rules: When to update? In: Proc. 1997 ACM-SIGMOD Workshop on Data Mining and Knowledge Discovery (DMKD 1997), Tucson, Arizona (May 1997)

    Google Scholar 

  32. Vitter, J.: Random sampling with a reservoir. ACM Transactions on Mathematical Software 11(1), 37–57 (1985)

    Article  MATH  MathSciNet  Google Scholar 

  33. Pagano, M., Gauvreau, K.: Principles of biostatistics, Duxbury, Thomsom Learning,USA (2000)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Jayant R. Haritsa Ramamohanarao Kotagiri Vikram Pudi

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ng, W., Dash, M. (2008). A Test Paradigm for Detecting Changes in Transactional Data Streams. In: Haritsa, J.R., Kotagiri, R., Pudi, V. (eds) Database Systems for Advanced Applications. DASFAA 2008. Lecture Notes in Computer Science, vol 4947. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78568-2_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-78568-2_17

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-78567-5

  • Online ISBN: 978-3-540-78568-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics