International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 43 - Number 7 |
Year of Publication: 2012 |
Authors: Noha Negm, Passent Elkafrawy, Abdel Badea Salem |
10.5120/6115-8296 |
Noha Negm, Passent Elkafrawy, Abdel Badea Salem . A Survey of Web Information Extraction Tools. International Journal of Computer Applications. 43, 7 ( April 2012), 19-27. DOI=10.5120/6115-8296
The access to huge amount of information sources on the internet has been limited to browsing and searching due to the heterogeneity and the lack of structure of the web information sources. This has resulted in the need for automated Web Information Extraction (IE) tools that analyze the Web pages and harvest useful information from noisy content for any further analysis. The goal of this survey is to provide a comprehensive review of the major Web IE tools that used for Web text and based on Document Object Model for representing the web pages. This paper compares them in three dimensions: (1) the source of content extraction, (2) the techniques used, and (3) the features of the tools, moreover the advantages and disadvantages for each tool. Based on this survey, we can decide which suitable Web IE tool will be integrated in our future work in Web Text Mining.