Innovations and Trends in Computer and Communication Engineering |
Foundation of Computer Science USA |
ITCCE - Number 2 |
December 2014 |
Authors: Jyoti Mhaske |
f4bf8943-ee50-4acd-bb25-62f38f556769 |
Jyoti Mhaske . Comparative Cost Analysis Of Template Extraction from Heterogeneous Web Documents. Innovations and Trends in Computer and Communication Engineering. ITCCE, 2 (December 2014), 16-18.
Extracting structured information from unstructured and semi-structured machine-readable documents automatically it plays vital role in now a days. So most websites are using common templates with contents to populate the information to achieve good publishing productivity. Where Internet is the major resource for extracting the information. In recent days Template detection technique received lot of concentration to improve in different aspects like performance of search engine , clustering and classification of web documents , as templates degrade the performance and accuracy of web application for a machines because of irrelevant template terms. So Novel algorithms is useful for extracting templates from a large number of web documents which are generated from heterogeneous templates. Using the similarity of underlying template structures in the document cluster the web documents so that template for each cluster is extracted simultaneously.