CFP last date
20 December 2024
Reseach Article

Novel Algorithms CIPFP for Mining Frequent Patterns using Counting Inference from Probabilistic Databases and Future Possibilities

by Niket Bhargava, Manoj Shukla
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 140 - Number 10
Year of Publication: 2016
Authors: Niket Bhargava, Manoj Shukla
10.5120/ijca2016909478

Niket Bhargava, Manoj Shukla . Novel Algorithms CIPFP for Mining Frequent Patterns using Counting Inference from Probabilistic Databases and Future Possibilities. International Journal of Computer Applications. 140, 10 ( April 2016), 39-54. DOI=10.5120/ijca2016909478

@article{ 10.5120/ijca2016909478,
author = { Niket Bhargava, Manoj Shukla },
title = { Novel Algorithms CIPFP for Mining Frequent Patterns using Counting Inference from Probabilistic Databases and Future Possibilities },
journal = { International Journal of Computer Applications },
issue_date = { April 2016 },
volume = { 140 },
number = { 10 },
month = { April },
year = { 2016 },
issn = { 0975-8887 },
pages = { 39-54 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume140/number10/24634-2016909478/ },
doi = { 10.5120/ijca2016909478 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T23:41:58.696451+05:30
%A Niket Bhargava
%A Manoj Shukla
%T Novel Algorithms CIPFP for Mining Frequent Patterns using Counting Inference from Probabilistic Databases and Future Possibilities
%J International Journal of Computer Applications
%@ 0975-8887
%V 140
%N 10
%P 39-54
%D 2016
%I Foundation of Computer Science (FCS), NY, USA
Abstract

We consider the problem of discovering frequent item sets and association rules between items in a large database of transactional databases acquired under uncertainty. A probabilistic database considered here is one in which with each transaction associated is a probability, represents the confidence that the transaction will occur with given associated certainty. In this paper, we address the problem of the efficiency of the main phase of most data mining applications: The frequent pattern extraction. This problem is mainly related to the number of operations required for counting pattern supports in the database and we propose a new method, called counting inference probabilistic frequent pattern miner in probabilistic databases, this algorithm allows to perform as few support counts as possible. It is optimized to reduce the number of database scan as well as the number of patterns for which explicit support count is required. Using this method, the support of a pattern is determined without accessing the database whenever possible, using the supports of some of its sub-patterns called key patterns. This method was implemented in the CIPFP, counting inference based probabilistic frequent pattern mining algorithm that is an optimization of the simple and efficient Apriori algorithm. The goal is to transform all key patterns into non-key patterns as early as possible as for non-key-patterns database scan is not required at all.

References
  1. R. Agrawal, C. Faloutsos, and A. Swami. Efcient similarity search in sequence databases. In Proc. of the Fourth International Conference on Foundations of Data Organization and Algorithms, Chicago, October 1993.
  2. R. Agrawal, S. Ghosh, T. Imielinski, B. Iyer, and A. Swami. An interval classi er for database mining applications. In Proc. of the VLDB Conference, pages 560{573, Vancouver, British Columbia, Canada, 1992.
  3. R. Agrawal, T. Imielinski, and A. Swami. Database mining: A performance perspective. IEEE Transactions on Knowledge and Data En- gineering, 5(6):914{925,\December 1993. Special Issue on Learning and Discovery in Knowledge- Based Databases.
  4. R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of items in large databases. In Proc. of the ACM SIGMOD Con- ference on Management of Data, Washington, D.C., May 1993.
  5. R. Agrawal and R. Srikant. Fast algorithms for mining association rules in large databases. Re- search Report RJ 9839, IBM Almaden Research Center, San Jose, California, June 1994.
  6. D. S. Associates. The new direct marketing. Business One Irwin, Illinois, 1990.
  7. R. Brachman et al. Integrated support for data archeology. In AAAI-93 Workshop on Knowledge Discovery in Databases, July 1993.
  8. L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. Classi cation and Regression Trees. Wadsworth, Belmont, 1984.
  9. P. Cheeseman et al. Autoclass: A bayesian classi cation system. In 5th Int'l Conf. on Machine Learning. Morgan Kaufman, June 1988.
  10. D. H. Fisher. Knowledge acquisition via incremental conceptual clustering. Machine Learning, 2(2), 1987.
  11. J. Han, Y. Cai, and N. Cercone. Knowledge discovery in databases: An attribute oriented approach. In Proc. of the VLDB Conference, pages 547{559, Vancouver, British Columbia, Canada, 1992.
  12. M. Holsheimer and A. Siebes. Data mining: The search for knowledge in databases. Technical Report CS-R9406, CWI, Netherlands, 1994.
  13. M. Houtsma and A. Swami. Set-oriented mining of association rules. Research Report RJ 9567, IBM Almaden Research Center, San Jose, Cali- fornia, October 1993.
  14. R. Krishnamurthy and T. Imielinski. Practi- tioner problems in need of database research: Re- search directions in knowledge discovery. SIG- MOD RECORD, 20(3):76{78, September 1991.
  15. P. Langley, H. Simon, G. Bradshaw, andJ. Zytkow. Scienti c Discovery: Computational Explorations of the Creative Process. MIT Press, 1987.
  16. H. Mannila and K.-J. Raiha. Dependency inference. In Proc. of the VLDB Conference, pages 155{158, Brighton, England, 1987.
  17. H. Mannila, H. Toivonen, and A. I. Verkamo. E cient algorithms for discovering association rules. In KDD-94: AAAI Workshop on Knowl- edge Discovery in Databases, July 1994.
  18. S. Muggleton and C. Feng. E cient induction of logic programs. In S. Muggleton, editor, Inductive Logic Programming. Academic Press, 1992.
  19. J. Pearl. Probabilistic reasoning in intelligent systems: Networks of plausible inference, 1992.
  20. G. Piatestsky-Shapiro. Discovery, analy- sis, and presentation of strong rules. In G. Piatestsky-Shapiro, editor, Knowledge Dis- covery in Databases. AAAI/MIT Press, 1991.
  21. G. Piatestsky-Shapiro, editor. Knowledge Dis- covery in Databases. AAAI/MIT Press, 1991.
  22. J. R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufman, 1993.
  23. Rakesh Agrawal, Ramakrishnan Srikant: “Fast Algorithms for Mining Association Rules” Proceedings of the 20th VLDB Conference Santiago, Chile, 1994
  24. Liwen Sun, Reynold Cheng, David W. Cheung, Jiefeng Cheng, “Mining Uncertain Data with Probabilistic Guarantees”, KDD’10, July 25–28, 2010, Washington, DC, USA. Copyright 2010 ACM 978-1-4503-0055-1/10/07
  25. A. Deshpande et al. Model-driven data acquisition in sensor networks. In VLDB, 2004.
  26. C. Aggarwal, Y. Li, J. Wang, and J. Wang. Frequent pattern mining with uncertain data. In KDD, 2009.
  27. C. Aggarwal and P. Yu. A survey of uncertain data algorithms and applications. IEEE Transactions on Knowledge and Data Engineering, 21(5), 2009.
  28. R. Agrawal and R. Srikant. Fast algorithms for mining association rules in large databases. Technical report, RJ 9839, IBM, 1994.
  29. R. Bayardo, Jr. Efficiently mining long patterns from databases. In SIGMOD, 1998.
  30. D. Burdick, M. Calimlim, J. Flannick, J. Gehrke, and T. Yiu. MAFIA: A maximal frequent itemset algorithm. IEEE Transactions on Knowledge and Data Engineering, 17, 2005.
  31. H. Cheng, P. Yu, and J. Han. Approximate frequent itemset mining in the presence of random noise. Soft Computing for Knowledge Discovery and Data Mining, 2008.
  32. R. Cheng, D. Kalashnikov, and S. Prabhakar. Evaluating probabilistic queries over imprecise data. In SIGMOD, 2003.
  33. C. K. Chui, B. Kao, and E. Hung. Mining frequent itemsets from uncertain data. In PAKDD, 2007.
  34. G. Cormode and M. Garofalakis. Sketching probabilistic data streams. In SIGMOD, 2007.
  35. N. Dalvi and D. Suciu. Efficient query evaluation on probabilistic databases. In VLDB, 2004.
  36. M. Garofalakis and A. Kumar. Wavelet synopses for general error metrics. ACM Transactions on Database Systems, 30(4), 2005.
  37. K. Gouda and M. J. Zaki. GenMax: An efficient algorithm for mining maximal frequent itemsets. Data Mining and Knowledge Discovery, 11(3), 2005.
  38. R. Hogg, A. Craig, and J. Mckean. Introduction to Mathematical Statistics (6th ed.). Prentice Hall, 2004.
  39. J. Huang et al. MayBMS: A Probabilistic Database Management System. In SIGMOD, 2009.
  40. N. Khoussainova, M. Balazinska, and D. Suciu.Towards correcting input data errors probabilistically using integrity constraints. In MobiDE, 2006.
  41. D. Knuth. The art of computer programming, vol. 3.Addison Wesley, 1998.
  42. H. Kriegel and M. Pfeifle. Density-based clustering of uncertain data. In KDD, 2005.
  43. C. Kuok, A. Fu, and M. Wong. Mining fuzzy association rules in databases. SIGMOD Record, 1998.
  44. A. Lu, Y. Ke, J. Cheng, and W. Ng. Mining vague association rules. In DASFAA, 2007.
  45. M. Mutsuzaki et al. Trio-one: Layering uncertainty and lineage on a conventional dbms. In CIDR, 2007.
  46. M. Yiu et al. Efficient evaluation of probabilistic advanced spatial queries on existentially uncertain data. IEEE Transactions on Knowledge and Data Engineering, 21(9), 2009.
  47. R. Motwani and P. Raghavan. Randomized algorithms. Cambridge University Press, New York, NY, USA, 1995.
  48. A. Oppenheim, R. Schafer, and J. Buck. Discrete-time signal processing (2nd ed.). Prentice Hall, 1999.
  49. P. Sistla et al. Querying the uncertain position of moving objects. In Temporal Databases: Research and Practice. Springer Verlag, 1998.
  50. J. Ren, S. Lee, X. Chen, B. Kao, R. Cheng, and D. Cheung. Na¨ıve Bayes Classification of Uncertain Data. In ICDM, 2009.
  51. T. Bernecker et al. Probabilistic frequent itemset mining in uncertain databases. In KDD, 2009.
  52. T. Jayram et al. Avatar information extraction system. IEEE Data Engineering Bulletin, 29(1), 2006.
  53. P. Tan, M. Steinbach, and V. Kumar. Introduction to Data Mining. Pearson Education, 2006.
  54. C. Yang and W. Najm. Examining driver behavior using data gathered from red light photo enforcement cameras. Journal of Safety Research, 38(3), 2007.
  55. Q. Zhang, F. Li, and K. Yi. Finding frequent items in probabilistic data. In SIGMOD, 2008.
Index Terms

Computer Science
Information Sciences

Keywords

Probabilistic frequent patterns probabilistic frequent rule. key-patterns non-key-patterns