Automatic Declassification of Textual Documents by Generalizing Sensitive Terms

Veena Vasudevan; Ansamma John

Call for Paper

August Edition

IJCA solicits high quality original research papers for the upcoming August edition of the journal. The last date of research paper submission is 21 July 2025

Submit your paper

Know more

The week's pick

FORENSIC ANALYSIS FRAMEWORKS FOR ENCRYPTED CLOUD STORAGE INVESTIGATIONS

Joy Awoleye Sarah Mavire Allan Munyira Kelvin Magora

Random Articles

Impact of using Snowflake Schema and Bitmap Index on Data Warehouse Querying

Jan

2018

Customer Complain Detection in E-commerce Platforms using NLP

Dec

2022

Comparative Analysis of Search Algorithms

Jun

2018

Enhanced HMM Speech Emotion Recognition using SVM and Neural Classifier

February

2014

Reseach Article

Automatic Declassification of Textual Documents by Generalizing Sensitive Terms

by Veena Vasudevan, Ansamma John

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 100 - Number 18

Year of Publication: 2014

Authors: Veena Vasudevan, Ansamma John

10.5120/17626-8390

Veena Vasudevan, Ansamma John . Automatic Declassification of Textual Documents by Generalizing Sensitive Terms. International Journal of Computer Applications. 100, 18 ( August 2014), 24-28. DOI=10.5120/17626-8390

@article{ 10.5120/17626-8390,

author = { Veena Vasudevan, Ansamma John },

title = { Automatic Declassification of Textual Documents by Generalizing Sensitive Terms },

journal = { International Journal of Computer Applications },

issue_date = { August 2014 },

volume = { 100 },

number = { 18 },

month = { August },

year = { 2014 },

issn = { 0975-8887 },

pages = { 24-28 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume100/number18/17626-8390/ },

doi = { 10.5120/17626-8390 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T22:30:18.606746+05:30

%A Veena Vasudevan

%A Ansamma John

%T Automatic Declassification of Textual Documents by Generalizing Sensitive Terms

%J International Journal of Computer Applications

%@ 0975-8887

%V 100

%N 18

%P 24-28

%D 2014

%I Foundation of Computer Science (FCS), NY, USA

Abstract

With the advent of internet, large numbers of text documents are published and shared every day . Each of these documents is a collection of vast amount of information. Publically sharing of some of this information may affect the privacy of the document, if they are confidential information. So before document publishing, sanitization operations are performed on the document for preserving the privacy and inorder to retain the utility of the document. Various schemes were developed to solve this problem but most of them turned out to be domain specific and most of them didn't consider the presence of semantically correlated terms. This paper presents a generalized sanitization method that discovers the sensitive information based on the concept of information content. The proposed method removes the confidential information from the text document by first finding the independent sensitive terms. Then with the use of these sensitive terms the correlated terms that cause a disclosure threat are discovered. Again with the help of a generalization algorithm these sensitive and correlated terms with high disclosure risk are generalized.

References

A. shamir,"How to share a secret", comun ACM,vol 22,no. 11,pp,612-613,1979
F. Baiardi, A. Falleni, R. Granchi, F. Martinelli, M. Petrocchi, and A. Vaccarelli, "Seas, a secure e-voting protocol: Design and implementation," Comput. Security, vol. 24, no. 8, pp. 642–652, Nov. 2005. .
A. Friedman, R. Wolff, and A. Schuster, "Providing k-anonymity in data mining," VLDB Journal, vol. 17, no. 4, pp. 789–804, Jul. 2008. .
Q. Xie and U. Hengartner, "Privacy-preserving matchmaking for mobile social networking secure against malicious users," in Proc. 9th Ann. IEEE Conf. Privacy, Security and Trust, Jul. 2011, pp. 252–259.
D. Chaum, "Untraceable electronic mail, return address and digital pseudonyms," Commun. ACM, vol. 24, no. 2, pp. 84–88, Feb. 1981.
Sánchez, D. , Batet, M. , and Viejo, A. "Detecting sensitive information from textual documents: An information theoretic approach", Modeling decisions for artificial intelligence. 9th international conference, mdai ,Springer,2012 (Vol. 7647, pp. 173-184 )
D. Sánchez, M. Batet, A. Viejo, "Automatic general-purpose sanitization of textual documents", IEEE Transactions on Information Forensics and Security 8 (2013) 853–862.
C. Cumby and R. Ghan, "A machine learning based system for semi-automatically redacting documents," in Proc. 23rd Innovative Application of Artificial Intelligence Conf. , 2011, pp. 1628–1635.
B. Anandan, C. Clifton, W. Jiang, M. Murugesan, P. Pastrana-Camacho, and L. Si, "t-plausibility: Generalizing words to desensitize text," Trans. Data Privacy, vol. 5, pp. 505–534, 2012.
D. Abril, G. Navarro-Arribas, and V. Torra, "On the declassification of confidential documents," in Proc. Modeling Decisions for Artificial
DARPA, New Technologies to Support Declassification Request for Information (RFI) Defense Advanced Research Projects Agency. Solicitation Number: DARPA-SN-10-73, 2010. .
S. M. Meystre, F. J. Friedlin, B. R. South, S. Shen, and M. H. Samore, "Automatic de-identification of textual documents in the electronic health record: A review of recent research," BMC Med. Res. Methodology, vol. 10, pp. 70–86, 2010
] Nat. Security Agency, Redacting With Confidence: How to Safely Publish Sanitized Reports Converted From Word to pdf, Tech. Rep. I333- 015R-2005, 2005.
L. Sweeney, "Replacing personally-identifying information in medical records, the scrub system," in Proc. 1996 American Medical Informatics Association Ann. Symp. , 1996, pp. 333–337.
M. M. Douglass, G. D. Cliffford, A. Reisner, W. J. Long, G. B. Moody, and R. G. Mark, "De-identification algorithm for free-text nursing notes," Proc. Computers in Cardiology'05, pp. 331–334, 2005.
V. T. Chakaravarthy, H. Gupta, P. Roy, and M. Mohania, "Efficient techniques for document sanitization," in Proc. ACM Conf. Information and Knowledge Management'08, 2008, pp. 843–852
D. Abril, G. Navarro-Arribas, and V. Torra, "On the declassification of confidential documents," in Proc. Modeling Decisions for Artificial Intelligence'11, 2011, pp. 235–246.

Index Terms

Computer Science

Information Sciences

Keywords

Document Declassification Generalization Information content Privacy Term correlation Unstructured Data utility