CFP last date
20 January 2025
Reseach Article

A Survey of Automatic Deep Web Classification Techniques

by Umara Noor, Zahid Rashid, Azhar Rauf
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 19 - Number 6
Year of Publication: 2011
Authors: Umara Noor, Zahid Rashid, Azhar Rauf
10.5120/2362-3099

Umara Noor, Zahid Rashid, Azhar Rauf . A Survey of Automatic Deep Web Classification Techniques. International Journal of Computer Applications. 19, 6 ( April 2011), 43-50. DOI=10.5120/2362-3099

@article{ 10.5120/2362-3099,
author = { Umara Noor, Zahid Rashid, Azhar Rauf },
title = { A Survey of Automatic Deep Web Classification Techniques },
journal = { International Journal of Computer Applications },
issue_date = { April 2011 },
volume = { 19 },
number = { 6 },
month = { April },
year = { 2011 },
issn = { 0975-8887 },
pages = { 43-50 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume19/number6/2362-3099/ },
doi = { 10.5120/2362-3099 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T20:06:19.162314+05:30
%A Umara Noor
%A Zahid Rashid
%A Azhar Rauf
%T A Survey of Automatic Deep Web Classification Techniques
%J International Journal of Computer Applications
%@ 0975-8887
%V 19
%N 6
%P 43-50
%D 2011
%I Foundation of Computer Science (FCS), NY, USA
Abstract

To devise vision of the next generation of the web, deep web technologies have gained larger attention in a last few years. An eminent feature of next generation of web is the automation of tasks. A large part of Deep web comprises of online structured domain specific databases that are accessed using web query interfaces. The information contained in these databases is related to a particular domain. This highly relevant information is more suitable for satisfying the information needs of the users and large scale deep web integration. In order to make this extraction and integration process easier, it is necessary to classify the deep web databases into standard\ non-standard category domains. There are mainly two types of classification techniques i.e. manual and automatic. As the size of deep web is increasing at an exponential rate with the passage of time, it has become nearly impossible to classify these deep web search sources manually into their respective domains. For this purpose, several automatic deep web classification techniques have been proposed in the literature. In this paper apart from the literature survey, we propose a framework for analysis of automatic classification techniques of deep web. The framework provides a baseline for the analysis of rudiments of automatic classification techniques based on the parameters such as structured, unstructured, simple/advance query forms, content representative extraction methodology, level of classification, performance evaluation criteria and its results. Furthermore, we studied a number of automatic deep web classification techniques in the light of proposed framework.

References
  1. K. C.-C. Chang, B. He, C. Li, M. Patel, and Z. Zhang. Structured databases on the web: Observations and implications. SIGMOD Record, 33(3):61–70, Sept. 2004.
  2. B. He, T. Tao, and K. C.-C. Chang. "Organizing structured web sources by query schemas: a clustering approach," Proc. Of Conference on Information and Knowledge Management (CIKM 04), ACM Press, 2004, pp.22--31.
  3. Deep web search directory service: http://www.completeplanet.com.
  4. Deep web search directory service: http://www.invisibleweb.com.
  5. Wikipedia: http://en.wikipedia.org/wiki/Deep_Web
  6. BrightPlanet.com. The deep web: Surfacing hidden value. Accessible at http://brightplanet.com, July 2000.
  7. Barbosa, L., Freire, J., Silva, A. "Organizing hidden-web databases by clustering visible web documents," Proc. of IEEE 23rd International Conference on on Data Engineering (ICDE 07), IEEE Press, 2007, pp.326--335.
  8. L. Gravano, P. G. Ipeirotis, and M. Sahami. QProber: A system for automatic classification of hidden-Web databases. ACM TOIS, 21(1):1–41, 2003.
  9. Panagiotis G. Ipeirotis , Luis Gravano , Mehran Sahami, Automatic Classification of Text Databases Through Query Probing, Selected papers from the Third International Workshop WebDB 2000 on The World Wide Web and Databases, p.245-255, May 18-19, 2000
  10. B. He, M. Patel, Z. Zhang, and K. C.-C. Chang. Accessing the Deep Web: A survey. Communications of the ACM, 50(5):95–101, 2007.
  11. H. Xu, X. Hau, S. Wang, Y. Hu: A method of Deep Web Classification. Proceedings of sixth international Conference on Machine Learning and Cybernetics, Hong Kong, 19-22 August 2007.
  12. X. Xian, P. Zhao, W. Fang, J. Xin, Z. Cui: Automatic Classification of Deep Web Databases with Simple Query Interfaces. International Conference on Industrial Machatronics and Automation (ICIMA). 2009.
  13. W. Su, J. Wang, F. Lochovsky: Automatic Hierarchical Classification of Structured Deep Web Databases. WISE 2006, LNCS 4255, pp 210-221.
  14. Tiezheng Nie, Derong Shen, Ge Yu, Yue Kou: Subject-Oriented Classification Based on Scale Probing in the Deep Web. WAIM 2008: 224-229
  15. B. He and K. C. -C. Chang. Statistical schema matching across web query interfaces. SIGMOD Conference, 2003.
  16. A helpful guide to search engines: http://www.searchengineguide.com/
  17. Peiguang Lin, Yibing Du, Xiaohua Tan, Chao Lv: “Research on Automatic Classification for Deep Web Query Interfaces”, Intl. Symp. on Information Processing (ISIP), Moscow, pp. 313-317, May 2008.
  18. Hieu Quang Le, Stefan Conrad: Classifying Structured Web Sources Using Support Vector Machine and Aggressive Feature Selection. Lecture Notes in Business Information Processing, 2010, Volume 45, IV, 270-282.
  19. Pengpeng Zhao, Li Huang, Wei Fang and Zhiming Cui: Organizing Structured Deep Web by Clustering Query Interfaces Link Graph. Lecture Notes in Computer Science, 2008, Volume 5139/2008, 683-690.
Index Terms

Computer Science
Information Sciences

Keywords

Deep web web databases data integration domain concepts Survey