National Conference on Advances in Computer Science and Applications (NCACSA 2012) |
Foundation of Computer Science USA |
NCACSA - Number 5 |
May 2012 |
Authors: Syed Thousif Hussain |
a70612d5-7702-4f65-832d-72e5329635da |
Syed Thousif Hussain . Extracting Images from the Web using Data Mining Technique. National Conference on Advances in Computer Science and Applications (NCACSA 2012). NCACSA, 5 (May 2012), 21-24.
The objective of this work is to generate a large number of images for specified object class. The approach is to employ text, metadata and visual features and to use to gather many high quality images from the web. Candidates images are obtained by text based web search. The web page and the images are downloaded. The task is to remove irrelevant images and to re-rank. First, the images query page is downloaded. Second, it extracts images URL from downloaded page and place it in the database then ranking is done based on text surrounding and metadata features. SVM and Naive bayes classifier algorithm are compared for ranking. The top ranked images are used as training data and an SVM visual classifier is learned to improve re-ranking. The principal idea of the overall method is in combining text or metadata or visual features in order to achieve a completely automatic ranking of images.