CFP last date
20 December 2024
Reseach Article

An Improved Algorithm for Efficient Mining of Frequent Item Sets on Large Uncertain Databases

by Bala Yesu Chilakalapudi, Narayana Satyala, Satyanarayana Menda
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 73 - Number 12
Year of Publication: 2013
Authors: Bala Yesu Chilakalapudi, Narayana Satyala, Satyanarayana Menda
10.5120/12791-9932

Bala Yesu Chilakalapudi, Narayana Satyala, Satyanarayana Menda . An Improved Algorithm for Efficient Mining of Frequent Item Sets on Large Uncertain Databases. International Journal of Computer Applications. 73, 12 ( July 2013), 8-15. DOI=10.5120/12791-9932

@article{ 10.5120/12791-9932,
author = { Bala Yesu Chilakalapudi, Narayana Satyala, Satyanarayana Menda },
title = { An Improved Algorithm for Efficient Mining of Frequent Item Sets on Large Uncertain Databases },
journal = { International Journal of Computer Applications },
issue_date = { July 2013 },
volume = { 73 },
number = { 12 },
month = { July },
year = { 2013 },
issn = { 0975-8887 },
pages = { 8-15 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume73/number12/12791-9932/ },
doi = { 10.5120/12791-9932 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T21:39:53.057347+05:30
%A Bala Yesu Chilakalapudi
%A Narayana Satyala
%A Satyanarayana Menda
%T An Improved Algorithm for Efficient Mining of Frequent Item Sets on Large Uncertain Databases
%J International Journal of Computer Applications
%@ 0975-8887
%V 73
%N 12
%P 8-15
%D 2013
%I Foundation of Computer Science (FCS), NY, USA
Abstract

The data handled in emerging applications like location based services, sensor monitoring systems, and data integration, are often inexact in nature. In this paper, the important problem of extracting frequent item sets from a large uncertain database, interpreted under the Possible World Semantics (PWS) is presented. This issue is technically challenging, since an uncertain database contains an exponential number of possible worlds. By observing that the mining process can be modeled as a Poisson binomial distribution, an algorithm was developed, which can efficiently and accurately discover frequent item sets in a large uncertain database. The important issue of maintaining the mining result for a database that is evolving (e. g. , by inserting a tuple) can be presented. Specifically, the proposed mining algorithm can enable Probabilistic Frequent Item set (PFI) results to be refreshed. This reduces the need of re-executing the whole mining algorithm on the new database, which is often more expensive and unnecessary. The proposed algorithm can support incremental mining and provides the accurate results on mining the uncertain database. The extensive evaluation on real data set to validate the approach is performed.

References
  1. A. Veloso, W. Meira Jr. , M. de Carvalho, B. Poˆ ssas, S. Parthasarathy, and M. J. Zaki, "Mining Frequent Itemsets in Evolving Databases," Proc. Second SIAM Int'l Conf. Data Mining (SDM), 2002.
  2. C. Aggarwal, Y. Li, J. Wang, and J. Wang, "Frequent Pattern Mining with Uncertain Data, ", Proc. 15th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD), 2009.
  3. C. Aggarwal and P. Yu, "A Survey of Uncertain Data Algorithm and Applications," IEEE Trans Knowledge and Data Eng. , vol. 21, no. 5, pp. 609-623, May 2009.
  4. R. Agrawal, T. Imieli?nski, and A. Swami, "Mining Association Rules between Sets of Items in Large Databases," Proc. ACM SIGMOD Int'l Conf. Management of Data, 1993. [5 ] O. Benjelloun, A. D. Sarma, A. Halevy, and J. Widom, "ULDBs: Databases with Uncertainty and Lineage," Proc. 32ndInt'l Conf. Very Large Data Bases (VLDB), 2006.
  5. T. Bernecker, H. Kriegel, M. Renz, F. Verhein, and A. Zuefle,"Probabilistic Frequent Itemset Mining in Uncertain Databases," Proc. 15th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining (KDD), 2009.
  6. C. J. van Rijsbergen, Information Retrieval. Butterworth, 1979.
  7. L. L. Cam, "An Approximation Theorem for the Poisson Binomial Distribution," Pacific J. Math. , vol. 10, pp. 1181- 1197,1960.
  8. H. Cheng, P. Yu, and J. Han, " Frequent Itemset Mining in the Presence of Random Noise," Proc. Soft Computing for Knowledge Discovery and Data Mining, pp. 363-389, 2008.
  9. R. Cheng, D. Kalashnikov, and S. Prabhakar, "Evaluating Probabilistic Queries over Imprecise Data," Proc. ACM SIGMOD Int'l Conf. Management of Data, 2003.
  10. D. Cheung, J. Han, V. Ng, and C. Wong, "Maintenance of Discovered Association Rules in Large Databases: An Updating Technique," Proc. 12th Int'l Conf. Data Eng. (ICDE), 1996.
  11. D. Cheung, S. D. Lee, and B. Kao, "A General Technique for Maintaining Discovered Association Rules," Proc. fifth Int'l Conf. Database Systems for Advanced Applications (DASFAA), 1997.
  12. W. Cheung and O. R. Za?¨ane, " Mining of Frequent Patterns without Candidate Generation or Support Constraint," Proc. Seventh Int'l Database Eng. and Applications Symp. (IDEAS), 2003.
  13. C. K. Chui, B. Kao, and E. Hung, "Mining Frequent Itemsets from Uncertain Data," Proc. 11th Pacific-Asia Conf. advancesin Knowledge Discovery and Data Mining (PAKDD), 2007.
  14. G. Cormode and M. Garofalakis, "Sketching Probabilistic Data Streams," Proc. ACM SIGMOD Int'l Conf. Management of Data, 2007.
  15. N. Dalvi and D. Suciu, "Efficient Query Evaluation on Probabilistic Databases," Proc. 13th Int'l Conf. Very Large Data Bases (VLDB), 2004.
  16. A. Deshpande, C. Guestrin, S. Madden, J. Hellerstein, and W. Hong, "Model-Driven Data Acquisition in Sensor Networks," Proc. 13th Int'l Conf. Very Large Data Bases (VLDB), 2004.
  17. J. Han, J. Pei, and Y. Yin,"Mining Frequent Patterns without Candidate Generation," Proc. ACM SIGMOD Int'l Conf. Management of Data, 2000.
  18. J. Huang, "MayBMS: A Probabilistic Database Management System," Proc. 35th ACM SIGMOD Int'l Conf. Management of Data, 2009.
  19. R. Jampani, L. Perez, M. Wu, F. Xu, C. Jermaine, and P. Haas, "MCDB: A Monte Carlo Approach to Managing Uncertain Data," Proc. ACM SIGMOD Int'l Conf. Management of Data 2008.
  20. J. Ren, S. D. Lee, X. Chen, B. Kao, R. Cheng, and D. W. Cheung" Naive Bayes Classification of Uncertain Data," Proc. IEEE Ninth Int'l Conf. Data Mining (ICDM), 2009.
  21. N. Khoussainova, M. Balazinska, and D. Suciu, "Towards Correcting Input Data Errors Probabilistically Using Integrity Constraints," Proc. Fifth ACM Int'l Workshop Data Eng. For Wireless and Mobile Access (MobiDE), 2006.
  22. H. Kriegel and M. Pfeifle, "Density-Based Clustering of Uncertain Data," Proc. ACM SIGKDD Int'l Conf. Knowledge Discovery in Data Mining (KDD), 2005.
  23. C. Kuok, A. Fu, and M. Wong, "Mining Fuzzy Association Rules in Databases," SIGMOD Record, vol. 27, no. 1, pp. 41-46, 1998. Proc. IEEE Fifth Int'l Conf. Data Mining (ICDM), 2005
  24. C. K. -S. Leung, Q. I. Khan, and T. Hoque, "Cantree: A Tree Structure for EfficientMining of Frequent Patterns,". Proc. IEEE Fifth Int'l Conf. Data Mining (ICDM), 2005.
  25. A. Lu, Y. Ke, J. Cheng, and W. Ng, "Mining Vague Association Rules," Proc. 12th Int'l Conf. Database Systems for Advanced Applications (DASFAA), 2007.
  26. M. Mutsuzaki, "Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS," Proc. Third Biennial Conf. Innovative Data Systems Research (CIDR), 2007.
  27. P. Sistla, O. Wolfson, S. Chamberlain, and S. Dao, "Querying the Uncertain Position of Moving Objects," Temporal Databases: Research and Practice, Springer Verlag, 1998.
  28. C. Stein, Computation of Expectations, Lecture Notes Monograph Series, vol. 7, Inst. of Math. Statistics, 1986.
  29. L. Sun, R. Cheng, D. W. Cheung, and J. Cheng, "Mining Uncertain Data with Probabilistic Guarantees," Proc. 16th ACM SIGKDD Int'l Conf. Knowledge Discovery and Data Mining, 2010.
  30. T. Jayram et al. , "Avatar Information Extraction System," IEEE Data Eng. Bull. , vol. 29, no. 1, pp. 40-48, Mar. 2006.
  31. S. Tsang, B. Kao, K. Y. Yip, W. -S. Ho, and S. D. Lee. , "Decision Trees for Uncertain Data," Proc. IEEE Int'l Conf. Data Eng. (ICDE), 2009.
  32. L. Wang, R. Cheng, S. D. Lee, and D. Cheung, "Accelerating Probabilistic Frequent Itemset Mining: A Approach," Proc. 19th ACM Int'l Conf. Information and Knowledge Management (CIKM), 2010.
  33. M. Yiu, N. Mamoulis, X. Dai, Y. Tao, and M. Vaitis, "Efficient Evaluation of Probabilistic Advanced Spatial Queries on existentially Uncertain Data," IEEE Trans Knowledge and Data Eng. , vol. 21, no. 9, pp. 108-122, Jan. 2009.
  34. Q. Zhang, F. Li, and K. Yi, "Finding Frequent Items in Probabilistic Data," Proc. ACM SIGMOD Int'l Conf. Management of Data, 2008.
Index Terms

Computer Science
Information Sciences

Keywords

PFI PWS S-PMF CDF