CFP last date
20 January 2025
Reseach Article

Automatic Declassification of Textual Documents by Generalizing Sensitive Terms

by Veena Vasudevan, Ansamma John
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 100 - Number 18
Year of Publication: 2014
Authors: Veena Vasudevan, Ansamma John
10.5120/17626-8390

Veena Vasudevan, Ansamma John . Automatic Declassification of Textual Documents by Generalizing Sensitive Terms. International Journal of Computer Applications. 100, 18 ( August 2014), 24-28. DOI=10.5120/17626-8390

@article{ 10.5120/17626-8390,
author = { Veena Vasudevan, Ansamma John },
title = { Automatic Declassification of Textual Documents by Generalizing Sensitive Terms },
journal = { International Journal of Computer Applications },
issue_date = { August 2014 },
volume = { 100 },
number = { 18 },
month = { August },
year = { 2014 },
issn = { 0975-8887 },
pages = { 24-28 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume100/number18/17626-8390/ },
doi = { 10.5120/17626-8390 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T22:30:18.606746+05:30
%A Veena Vasudevan
%A Ansamma John
%T Automatic Declassification of Textual Documents by Generalizing Sensitive Terms
%J International Journal of Computer Applications
%@ 0975-8887
%V 100
%N 18
%P 24-28
%D 2014
%I Foundation of Computer Science (FCS), NY, USA
Abstract

With the advent of internet, large numbers of text documents are published and shared every day . Each of these documents is a collection of vast amount of information. Publically sharing of some of this information may affect the privacy of the document, if they are confidential information. So before document publishing, sanitization operations are performed on the document for preserving the privacy and inorder to retain the utility of the document. Various schemes were developed to solve this problem but most of them turned out to be domain specific and most of them didn't consider the presence of semantically correlated terms. This paper presents a generalized sanitization method that discovers the sensitive information based on the concept of information content. The proposed method removes the confidential information from the text document by first finding the independent sensitive terms. Then with the use of these sensitive terms the correlated terms that cause a disclosure threat are discovered. Again with the help of a generalization algorithm these sensitive and correlated terms with high disclosure risk are generalized.

References
  1. A. shamir,"How to share a secret", comun ACM,vol 22,no. 11,pp,612-613,1979
  2. F. Baiardi, A. Falleni, R. Granchi, F. Martinelli, M. Petrocchi, and A. Vaccarelli, "Seas, a secure e-voting protocol: Design and implementation," Comput. Security, vol. 24, no. 8, pp. 642–652, Nov. 2005. .
  3. A. Friedman, R. Wolff, and A. Schuster, "Providing k-anonymity in data mining," VLDB Journal, vol. 17, no. 4, pp. 789–804, Jul. 2008. .
  4. Q. Xie and U. Hengartner, "Privacy-preserving matchmaking for mobile social networking secure against malicious users," in Proc. 9th Ann. IEEE Conf. Privacy, Security and Trust, Jul. 2011, pp. 252–259.
  5. D. Chaum, "Untraceable electronic mail, return address and digital pseudonyms," Commun. ACM, vol. 24, no. 2, pp. 84–88, Feb. 1981.
  6. Sánchez, D. , Batet, M. , and Viejo, A. "Detecting sensitive information from textual documents: An information theoretic approach", Modeling decisions for artificial intelligence. 9th international conference, mdai ,Springer,2012 (Vol. 7647, pp. 173-184 )
  7. D. Sánchez, M. Batet, A. Viejo, "Automatic general-purpose sanitization of textual documents", IEEE Transactions on Information Forensics and Security 8 (2013) 853–862.
  8. C. Cumby and R. Ghan, "A machine learning based system for semi-automatically redacting documents," in Proc. 23rd Innovative Application of Artificial Intelligence Conf. , 2011, pp. 1628–1635.
  9. B. Anandan, C. Clifton, W. Jiang, M. Murugesan, P. Pastrana-Camacho, and L. Si, "t-plausibility: Generalizing words to desensitize text," Trans. Data Privacy, vol. 5, pp. 505–534, 2012.
  10. D. Abril, G. Navarro-Arribas, and V. Torra, "On the declassification of confidential documents," in Proc. Modeling Decisions for Artificial
  11. DARPA, New Technologies to Support Declassification Request for Information (RFI) Defense Advanced Research Projects Agency. Solicitation Number: DARPA-SN-10-73, 2010. .
  12. S. M. Meystre, F. J. Friedlin, B. R. South, S. Shen, and M. H. Samore, "Automatic de-identification of textual documents in the electronic health record: A review of recent research," BMC Med. Res. Methodology, vol. 10, pp. 70–86, 2010
  13. ] Nat. Security Agency, Redacting With Confidence: How to Safely Publish Sanitized Reports Converted From Word to pdf, Tech. Rep. I333- 015R-2005, 2005.
  14. L. Sweeney, "Replacing personally-identifying information in medical records, the scrub system," in Proc. 1996 American Medical Informatics Association Ann. Symp. , 1996, pp. 333–337.
  15. M. M. Douglass, G. D. Cliffford, A. Reisner, W. J. Long, G. B. Moody, and R. G. Mark, "De-identification algorithm for free-text nursing notes," Proc. Computers in Cardiology'05, pp. 331–334, 2005.
  16. V. T. Chakaravarthy, H. Gupta, P. Roy, and M. Mohania, "Efficient techniques for document sanitization," in Proc. ACM Conf. Information and Knowledge Management'08, 2008, pp. 843–852
  17. D. Abril, G. Navarro-Arribas, and V. Torra, "On the declassification of confidential documents," in Proc. Modeling Decisions for Artificial Intelligence'11, 2011, pp. 235–246.
Index Terms

Computer Science
Information Sciences

Keywords

Document Declassification Generalization Information content Privacy Term correlation Unstructured Data utility