CFP last date
20 December 2024
Reseach Article

Mining Constant Conditional Functional Dependencies for Improving Data Quality

by D Devi Kalyani
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 74 - Number 15
Year of Publication: 2013
Authors: D Devi Kalyani
10.5120/12960-9988

D Devi Kalyani . Mining Constant Conditional Functional Dependencies for Improving Data Quality. International Journal of Computer Applications. 74, 15 ( July 2013), 12-20. DOI=10.5120/12960-9988

@article{ 10.5120/12960-9988,
author = { D Devi Kalyani },
title = { Mining Constant Conditional Functional Dependencies for Improving Data Quality },
journal = { International Journal of Computer Applications },
issue_date = { July 2013 },
volume = { 74 },
number = { 15 },
month = { July },
year = { 2013 },
issn = { 0975-8887 },
pages = { 12-20 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume74/number15/12960-9988/ },
doi = { 10.5120/12960-9988 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T21:42:21.659794+05:30
%A D Devi Kalyani
%T Mining Constant Conditional Functional Dependencies for Improving Data Quality
%J International Journal of Computer Applications
%@ 0975-8887
%V 74
%N 15
%P 12-20
%D 2013
%I Foundation of Computer Science (FCS), NY, USA
Abstract

This paper applies the data mining techniques in the area of data cleaning as effective in discovering Constant Conditional Functional Dependencies(CCFDs) from relational databases . These CCFDs are used as business rules for context dependent data validations. Conditional Functional Dependencies(CFDs) are an extension of Functional dependencies(FDs) which captures the consistency of data by supporting patterns of semantically related constants. Based on the hierarchy between FDs, CFDs and Association Rules :Union of Association Rules are CFDs, while union of CFDs are FDs. This paper proposes the algorithms used for Association Rule discovery to be reused for CCFD Mining i. e CFDs with constant patterns only . Three algorithms for CCFD mining namely CCFD-FPGrowth, CCFD-AprioriClose and CCFD-ZartMNR are provided in this paper. CCFD-FPGrowth uses FP-growth algorithm to find frequent itemsets and then generates rules as constant patterns from the set of frequent itemsets using modified Agrawal Association rule Generation algorithm. CCFD-AprioriClose uses Apriori algorithm to find frequent closed itemsets and then generates rules as constant patterns from the set of frequent closed itemsets using modified Agrawal Association rule Generation algorithm. CCFD-ZartMNR uses Zart algorithm to find closed itemsets and minimal generators and then generates minimal non-redundant rules from the set of closed itemsets. Experimental results on two real-world data sets show that this approach performs well across several dimensions such as recall, runtime and scalability.

References
  1. W. Fan, F. Geerts, X. Jia, and A. Kementsietsidis, "Conditional functional dependencies for capturing data inconsistencies," TODS, vol. 33, no. 2, june 2008.
  2. G. Cong, W. Fan, F. Geerts, X. Jia, and S. Ma, "Improving data quality:Consistency and accuracy," in VLDB, 2007
  3. Wenfei Fan , Floris Geerts , Jianzhong Li , Ming Xiong. Discovering Conditional Functional Dependencies. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 23, NO. 5, May 2011.
  4. Philip Bohannon, Wenfei Fan, Floris Geerts, Xibei Jia, and Anastasios Kementsietsidis. Conditional functional dependencies for data cleaning. In Proceedings of ICDE'07, April 15-20, Istanbul, Turkey, pages 746_755, 2007.
  5. Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach Data Mining and Knowledge Discovery Journal , 8, 53–87, 2004
  6. Discovering Frequent Closed Itemsets for Association Rules, Nicolas Pasquier, Yves Bastide, Rafik Taouil, Lotfi Lakhal. ICDT '99 Proceedings of the 7th International Conference on Database Theory, Pages 398 - 416 , Springer-Verlag London, UK ©1999
  7. Paul De Bra and Jan Paredaens. Conditional dependencies for horizontal decompositions. In Proceedings of the 10th Colloquium on Automata, Languages and Programming, pages 67_82, London, UK, 1983. Springer-Verlag.
  8. Fei Chiang and Renée J. Miller. Discovering data quality rules. PVLDB,1(1):1166_1177, 2008.
  9. Fast Algorithms for Mining Association Rules, Rakesh Agrawal, RamaKrishnan Srikant, In Proc. 20th Int. Conf. Very Large Data Bases, VLDB (December--JanuaryMay~ 1994), pp. 487-499
  10. Rakesh Agrawal, Tomasz Imielinski, and Arun N. Swami. Mining association rules between sets of items in large databases. In Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, Washington, D. C. , May 26-28, 1993, pages 207_216. ACM Press, 1993.
  11. Nicolas Pasquier, Yves Bastide, Ra_k Taouil, and Lot_ Lakhal. Discovering frequent closed itemsets for association rules. In ICDT, pages 398_416, 1999.
  12. Lukasz Golab, Howard Karlo_, Flip Korn, Divesh Srivastava, and Bei Yu. On generating near-optimal tableaux for conditional functional dependencies. Proc. VLDB Endow. , 1(1):376_390, 2008.
  13. A. Maydanchik Data Quality Assessment Technics Publications,336pp, ISBN-13: 9780977140022
  14. Raoul Medina and Lhouari Nourine. A unified hierarchy for functional dependencies, conditional functional dependencies and association rules. In ICFCA, Lecture Notes in Computer Science, pages 235_248. Springer, 2009
  15. Y. Huhtala, J. Kärkkäinen, P. Porkka, and H. Toivonen. Tane: An efficient algorithm for discovering functional and approximate dependencies. The Computer Journal, 42(2):100_111, 1999.
  16. Stéphane Lopes, Jean-Marc Petit, and Lot_ Lakhal. Efficient discovery of functional dependencies and armstrong relations. In EDBT 2000, volume 1777 of LNCS, pages 350_364, Konstanz, Germany, 2000. Springer.
  17. Noël Novelli and Rosine Cicchetti. Fun: An efficient algorithm for mining functional and embeddeddependencies. In Proceedings of the 8th International Conference on DatabaseTheory (ICDT'01), volume 1973 of Lecture Notes in Computer Science, pages 189_203, 2001.
  18. Catharine Wyss, Chris Giannella, and Edward Robertson. Fastfds: A heuristic-driven, depth-first algorithm for mining functional dependencies from relation instances extended abstract. Data Warehousing and Knowledge Discovery, pages 101_110, 2001.
  19. G. Birkhoff. Lattices theory. In coll. pub. XXV, Volume 25. American Mathematical Society,1967. Third edition.
  20. B. A. Davey and H. A. Priestley. Introduction to Lattices and Order. Cambridge University Press,1994. Fourth edition.
  21. L. Szathmary, A. Napoli, and S. O. Kuznetsov. ZART: A Multifunctional Itemset Mining Algorithm. In Proc. of the 5th Intl. Conf. on Concept Lattices and Their Applications (CLA '07), pages 26{37, Montpellier, France, Oct 2007
  22. Marzena Kryszkiewicz: Representative Association Rules and Minimum Condition Maximum Consequence Association Rules. PKDD 1998: 361-369 Minimum
  23. Kryszkiewicz, M. Concise representations of association rules. In: Pattern Detection and Discovery. (2002) 92–109
  24. Pasquier, N. , Bastide, Y. , Taouil, R. , Lakhal, L. : Closed set based discovery of small covers for association rules. In: Proc. 15emes Journees Bases de Donnees Avancees, BDA. (1999) 361–381
  25. Pasquier, N. : Mining association rules using formal concept analysis. In: Proc. Of the 8th International Conf. on Conceptual Structures (ICCS '00), Shaker-Verlag (2000) 259–264
Index Terms

Computer Science
Information Sciences

Keywords

Data Cleaning Constant Conditional Functional Dependency(CCFD) Conditional Functional Dependency(CFD) Frequent Pattern Growth (FP) tree Frequent Itemsets Closed Itemsets