International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 157 - Number 1 |
Year of Publication: 2017 |
Authors: N. P. Nethravathi, Prasanth G. Rao, Chaitra C. Vaidya, P. Deepa Shenoy, Venugopal K. R., Indiramma M. |
10.5120/ijca2017912353 |
N. P. Nethravathi, Prasanth G. Rao, Chaitra C. Vaidya, P. Deepa Shenoy, Venugopal K. R., Indiramma M. . Generic CBTS: Correlation based Transformation Strategy for Privacy Preserving Data Mining. International Journal of Computer Applications. 157, 1 ( Jan 2017), 1-7. DOI=10.5120/ijca2017912353
Mining useful knowledge from corpus of data has become an important application in many fields. Data Mining algorithms like Clustering, Classification work on this data and provide crisp information for analysis. As these data are available through various channels into public domain, privacy for the owners of the data is increasing need. Though privacy can be provided by hiding sensitive data, it will affect the Data Mining algorithms in knowledge extraction, so an effective mechanism is required to provide privacy to the data and at the same time without affecting the Data Mining results. Privacy concern is a primary hindrance for quality data analysis. Data mining algorithms on the contrary focus on the mathematical nature than on the private nature of the information. Therefore instead of removing or encrypting sensitive data, we propose transformation strategies that retain the statistical, semantic and heuristic nature of the data while masking the sensitive information. The proposed Correlation Based Transformation Strategy (CBTS) combines Correlation Analysis in tandem with data transformation techniques such as Singular Value Decomposition (SVD), Principal Component Analysis (PCA) and Non Negative Matrix Factorization (NNMF) provides the intended level of privacy preservation and enables data analysis. The proposed technique will work for numerical, ordinal and nominal data. The outcome of CBTS is evaluated on standard datasets against popular data mining techniques with significant success and Information Entropy is also accounted.