CFP last date
20 January 2025
Reseach Article

A Novel Technique for Database Selection and Document Selection

by Anil Agrawal, Mohd. Husain, Raj Gaurang Tiwari, Subodh Kumar
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 17 - Number 8
Year of Publication: 2011
Authors: Anil Agrawal, Mohd. Husain, Raj Gaurang Tiwari, Subodh Kumar
10.5120/2241-2865

Anil Agrawal, Mohd. Husain, Raj Gaurang Tiwari, Subodh Kumar . A Novel Technique for Database Selection and Document Selection. International Journal of Computer Applications. 17, 8 ( March 2011), 22-26. DOI=10.5120/2241-2865

@article{ 10.5120/2241-2865,
author = { Anil Agrawal, Mohd. Husain, Raj Gaurang Tiwari, Subodh Kumar },
title = { A Novel Technique for Database Selection and Document Selection },
journal = { International Journal of Computer Applications },
issue_date = { March 2011 },
volume = { 17 },
number = { 8 },
month = { March },
year = { 2011 },
issn = { 0975-8887 },
pages = { 22-26 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume17/number8/2241-2865/ },
doi = { 10.5120/2241-2865 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T20:05:03.589288+05:30
%A Anil Agrawal
%A Mohd. Husain
%A Raj Gaurang Tiwari
%A Subodh Kumar
%T A Novel Technique for Database Selection and Document Selection
%J International Journal of Computer Applications
%@ 0975-8887
%V 17
%N 8
%P 22-26
%D 2011
%I Foundation of Computer Science (FCS), NY, USA
Abstract

The Internet has become a cosmic information source in recent years and can be considered as the world's largest digital library. To aid ordinary users in finding desired data in this library, numerous search engines have been created. Each search engine has a corresponding database that defines the set of documents that can be searched by the search engine. Typically, an index for all documents in the database is created and stored in the search engine. Text data in the Internet can be partitioned into numerous databases naturally. Proficient retrieval of desired data can be realized if we can accurately envisage the usefulness of each database, because with such information, we only need to retrieve potentially useful documents from useful databases. For a given query ‘q’ the usefulness of a text database is defined to be the no. of documents in the database that are sufficiently relevant to the query ‘q’. In this paper, we propose innovative approaches for database selection and documents selection.

References
  1. L. Gravano and H. Garcia-Molina, “Generalizing GlOSS to Vector-Space databases and Broker Hierarchies,” Int’l Conf. Very Large Data Bases, p. 78-89, Sep. 1995.
  2. B. Jansen, A. Spink, J. Bateman, and T. Saracevic, “Real Life Information Retrieval: A Study of User Queries on the Web,” Proc. ACM Special Interest Group on Information Retrieval Forum, vol. 32, no. 1, 1998.
  3. B. Yuwono and D. Lee, “Server Ranking for Distributed Text Resource Systems on the Internet,” Proc. Fifth Int’l Conf. Database Systems for Advanced Applications, pp. 391-400, Apr. 1997.
  4. 4. J. Callan, Z. Lu, and W. Bruce Croft, “Searching Distributed Collections with Inference Networks,” Proc. ACM Special Interest Group on Information Retrieval Conf. pp. 21-28, July 1995.
  5. Patricia Correia Saraiva, Edleno Silva deMoura, Nivio Ziviani,WagnerMeira, Rodrigo Fonseca, and Berthier Ribeiro-Neto. Rank–Preserving Two–Level Caching for Scalable Search Engines. In ACM, editor, Proceedings of the SIGIR2001 conference, New Orleans, LA, September 2001. SIGIR.
  6. C. Badue, R. Baeza-Yates, B. Ribeiro-Neto, and N. Ziviani. Distributed query processing using partitioned inverted files. In Proc. of the 9th String Processing and Information Retrieval Symposium (SPIRE), September 2002.
  7. Paolo Boldi, Bruno Codenotti, Massimo Santini, and Sebastiano Vigna. Trovatore: Towards a Highly Scalable Distributed Web Crawler. InWWWPosters 2001, 2001.
  8. N. Craswell, P. Bailey, and D. Hawking. Server Selection on theWorldWideWeb. In Proceedings of the Fifth ACM Conference on Digital Libraries, pages 37–46, 2000.
  9. B. Yuwono and D. Lee, “Server Ranking for Distributed Text Resource Systems on the Internet,” Proc. Fifth Int’l Conf. Database Systems for Advanced Applications, pp. 391-400, Apr. 1997.
  10. Charu C. Aggarwal, Fatima Al-Garawi, and Philip S. Yu. Intelligent Crawling on the World Wide Web with Arbitrary Predicates. In Proceedings of the World Wide Web 2001 (WWW10), pages 96–105, 2001.
  11. S. Mukherjea. WTMS: A System for Collecting and Analyzing Topic-SpecicWeb Information. Computer Networks, 33(1):457–471, 2000.
  12. Boris Chidlovskii, Claudia Roncancio, and Marie-Luise Schneider. Semantic Cache Mechanism for Heterogeneous Web Querying. In Proceedings of the WWW8 Conference / Searching and Querying, 1999.
  13. J. Cho and H. Garcia-Molina. Estimating Frequency of Change. Technical report, Stanford University, 2000.
  14. Junghoo Cho and Hector Garcia-Molina. Synchronizing a Database to Improve Freshness. pages 117–128, 2000.
Index Terms

Computer Science
Information Sciences

Keywords

Metasearch Engine Distributed query processing Document selection