Reseach Article

Comparative Cost Analysis Of Template Extraction from Heterogeneous Web Documents

Published on December 2014 by Jyoti Mhaske
Innovations and Trends in Computer and Communication Engineering
Foundation of Computer Science USA
ITCCE - Number 2
December 2014
Authors: Jyoti Mhaske

Extracting structured information from unstructured and semi-structured machine-readable documents automatically it plays vital role in now a days. So most websites are using common templates with contents to populate the information to achieve good publishing productivity. Where Internet is the major resource for extracting the information. In recent days Template detection technique received lot of concentration to improve in different aspects like performance of search engine , clustering and classification of web documents , as templates degrade the performance and accuracy of web application for a machines because of irrelevant template terms. So Novel algorithms is useful for extracting templates from a large number of web documents which are generated from heterogeneous templates. Using the similarity of underlying template structures in the document cluster the web documents so that template for each cluster is extracted simultaneously.

Index Terms

Computer Science
Information Sciences


Web Template Extraction Clustering Documents Minimum Description Length Principle.