CFP last date
20 December 2024
Reseach Article

Hybrid Perturbation Technique using Feature Selection Method for Privacy Preservation in Data Mining

by Praveena Priyadarsini, M. L. Valarmathi, S. Sivakumari
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 58 - Number 2
Year of Publication: 2012
Authors: Praveena Priyadarsini, M. L. Valarmathi, S. Sivakumari
10.5120/9257-3427

Praveena Priyadarsini, M. L. Valarmathi, S. Sivakumari . Hybrid Perturbation Technique using Feature Selection Method for Privacy Preservation in Data Mining. International Journal of Computer Applications. 58, 2 ( November 2012), 34-41. DOI=10.5120/9257-3427

@article{ 10.5120/9257-3427,
author = { Praveena Priyadarsini, M. L. Valarmathi, S. Sivakumari },
title = { Hybrid Perturbation Technique using Feature Selection Method for Privacy Preservation in Data Mining },
journal = { International Journal of Computer Applications },
issue_date = { November 2012 },
volume = { 58 },
number = { 2 },
month = { November },
year = { 2012 },
issn = { 0975-8887 },
pages = { 34-41 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume58/number2/9257-3427/ },
doi = { 10.5120/9257-3427 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T21:01:32.403436+05:30
%A Praveena Priyadarsini
%A M. L. Valarmathi
%A S. Sivakumari
%T Hybrid Perturbation Technique using Feature Selection Method for Privacy Preservation in Data Mining
%J International Journal of Computer Applications
%@ 0975-8887
%V 58
%N 2
%P 34-41
%D 2012
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Privacy-preserving in data mining refers to the area of data mining that seeks to safeguard sensitive information from unsolicited or unsanctioned disclosure and hence protecting individual data records and their privacy. Data perturbation is a privacy preservation technique which does addition / multiplication of noise to the original data. It performs anonymization based on the data type of sensitive data. Generalization is a technique were quasi identifiers data are replaced by some other more general term. In this paper privacy protection is applied to high dimensional datasets like Adult and Census. For ranking the attributes, information gain feature subset selection method is used. The high ranking attributes with sensitive information are set as quasi identifiers of the datasets. A hybrid perturbation technique is used to perturb categorical and numeric attributes of both the datasets and the utility of the datasets is measured using accuracy on data mining functionalities. The data distortion is measured using maintenance of Rank of Features (CK) between the original and perturb datasets. Experimental results show that utility of the perturbed datasets comparable with the original dataset and the Census dataset has comparable CK value than adult dataset.

References
  1. Aggrawal, C. C. (2005): On K-Anonymity and the curse of dimensionality. In the proceedings of the 31st conference on VertLargDatabases (VLDB) 901-90.
  2. Agrawal,R. ; Srikant,R. (2000): Privacy-Preserving Data Mining by, In Proceedings of the 2000ACM SIGMOD conference on Management of Data, pages 439–450, Dallas, TX, May 14-19 2000 ACM.
  3. Agrawal,R. ; Srikant,R. (2000): . Privacy-preserving data mining. In Proc. of the ACM SIGMOD Conference On Management of Data, pages 439-450. ACM Press, May 2000.
  4. Alexandre Evfimievski : Privacy-Preserving Data Mining by IBM Almaden Research Center, USA Tyrone Grandison IBM Almaden Research Center, USA.
  5. Alsabt I. K; Srank ;Singh V. (2006): An Efficient K- Means Clustering Algorithm in 11th International Parallel Processing Symposium, 1998.
  6. Barzan Mozafari,; Carlo Zaniolo. ( 2006): "Publishing Naive Bayesian Classi?ers: Privacy without Accuracy Loss"
  7. Frank, A. ; Asuncion, A. (2010): UCI Machine Learning Repository [http://archive. ics. uci. edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.
  8. Giannella C and Liu K(2009) "On The privacy of Euclidean Distance Preserving Data Perturbation" Computer Science – Cryptography and Security.
  9. Guo, S. Wu, X and Li, Y (2006) "On the Lower Bound of Reconstruction Error for Spectral Filtering based Privacy Preserving Data Mining" in Proceedings of the 10th European conference on Principles and practices of Knowledge discovery in Databases Berlin, Germany.
  10. Han,J. ;Kamber,M. ( 2001): Data Mining Concepts and Techniques, Morgan Kaufmann.
  11. Islam, M. Z. ; Brankovic, L. ( 2007): Privacy Preserving Data Mining: Noise Addition to Categorical Values Using a Novel Clustering Technique, In IEEE Transactions on Industrial Informatics.
  12. Kantarcioglu, M. ; Jin, J. ; Clifton,C(2004): When Do Data Mining Results Violate Privacy? Proc. 2004, Int'l Conf. Knowledge Discovery and Data Mining, pp. 599-604.
  13. Kargupta,H. ;Datta,S. ;Wang,Q. ; Sivakumar, K. (2005): . Random-data perturbation techniques and privacypreserving data mining Knowledge and Information Systems, 7:387-414.
  14. Lindell,Y. ; Pinkas,B. ( 2000): Privacy Preserving Data Mining by,In Advances in CryptologyCRYPTO 2000, pages 36–54. Springer-Verlag, Aug. 20-24 2000.
  15. Mark Hall. ; Eibe Frank,; Geoffrey Holmes,; Bernhard Pfahringer,; Peter Reutemann,; Ian H. Witten (2009): The WEKA Data Mining Software: An Update; SIGKDD Explorations, Volume 11, Issue 1
  16. Muralidhar,K. ; Parsa,R;Sarathy,R. ( 1999): A general additive data perturbation method for database security. Management Science, 45(10):1399-1415.
  17. Pengpeng Lin,; Jun Zhang,; Ingrid St. Omer,; Huanjing Wang,; JieWang Proceedings(2011: A Comparative study on Data perturbation with feature selection, The international multi conference of Engineers and computer scientist 2011 vol 1, March 16-18, 2011 Hong Kong.
  18. Poovammal,E. ; Ponnavaikko,M. (2009): Task Independent Privacy Preserving Data Mining on Medical Dataset in 2009 International Conference on Advances in Computing, Control, and Telecommunication Technologies.
  19. Sweeney, L. (2002): Achieving k-anonymity privacy protection using generalization and suppression,International Journal on Uncertainty, Fuzziness, and Knowledge-based Systems, vol. 10, no. 5, pp. 571,588.
  20. Wang,J,; Zhong,W. J. ;Zhang,J. ; Xu,S. T. ( 2006): "Selective Data Distortion via Structural Partition and SSVD for Privacy Preservation," In Proceedings of the 2006 International conference on Information & Knowledge Engineering, pp: 114 - 120, CSREA Press, Las Vegas, Nevada, USA, June 26-29, 2006.
Index Terms

Computer Science
Information Sciences

Keywords

Data mining Privacy preservation perturbation generalization utility classifications clustering maintenance of Rank of Features