International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 58 - Number 2 |
Year of Publication: 2012 |
Authors: Praveena Priyadarsini, M. L. Valarmathi, S. Sivakumari |
10.5120/9257-3427 |
Praveena Priyadarsini, M. L. Valarmathi, S. Sivakumari . Hybrid Perturbation Technique using Feature Selection Method for Privacy Preservation in Data Mining. International Journal of Computer Applications. 58, 2 ( November 2012), 34-41. DOI=10.5120/9257-3427
Privacy-preserving in data mining refers to the area of data mining that seeks to safeguard sensitive information from unsolicited or unsanctioned disclosure and hence protecting individual data records and their privacy. Data perturbation is a privacy preservation technique which does addition / multiplication of noise to the original data. It performs anonymization based on the data type of sensitive data. Generalization is a technique were quasi identifiers data are replaced by some other more general term. In this paper privacy protection is applied to high dimensional datasets like Adult and Census. For ranking the attributes, information gain feature subset selection method is used. The high ranking attributes with sensitive information are set as quasi identifiers of the datasets. A hybrid perturbation technique is used to perturb categorical and numeric attributes of both the datasets and the utility of the datasets is measured using accuracy on data mining functionalities. The data distortion is measured using maintenance of Rank of Features (CK) between the original and perturb datasets. Experimental results show that utility of the perturbed datasets comparable with the original dataset and the Census dataset has comparable CK value than adult dataset.