International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 66 - Number 23 |
Year of Publication: 2013 |
Authors: Rupali Sunil Wagh |
10.5120/11258-6501 |
Rupali Sunil Wagh . Knowledge Discovery from Legal Documents Dataset using Text Mining Techniques. International Journal of Computer Applications. 66, 23 ( March 2013), 32-34. DOI=10.5120/11258-6501
Last few decades have witnessed exponential increase in the use of IT which has resulted into large amount of data being generated, stored and searched. Data may be highly structured stored as records of a DBMS, or may be totally unstructured like blog posts or plain text documents. With the abundance of information being available as text documents, the issue of retrieval of knowledge from such unstructured dataset is posing new challenges to the research community. Legal document analysis is one domain which generates and uses text information in semi structured as well as unstructured form. The process of legal reasoning and decision making is heavily dependent on information stored in text documents. Text Mining (TM) is defined as the process of extracting useful information from text data. Legal text documents are stored using natural languages. For efficient analysis of such documents, text mining, a specialized branch of machine learning can be suitably used. Text mining – which "mines text", is heavily associated with natural language processing and Information Retrieval. TM techniques can be used for extracting relevant knowledge from stored legal documents. The extracted knowledge is used to simplify the preparation of case base, facilitate in decision making and legal reasoning or for automatic identification of legal arguments. Research in the fields of information extraction, natural language processing, artificial intelligence and expert system has augmented text mining process for enhancing the knowledge discovery process in this domain. This paper proposes a study which is aimed at grouping of legal documents based on the contents without taking any external input using unsupervised text mining techniques.