Unsupervised Technique for Web Data Extraction: Trinity

Sayali Khodade; Nilav Mukherjee

Call for Paper

May Edition

IJCA solicits high quality original research papers for the upcoming May edition of the journal. The last date of research paper submission is 20 April 2026

Submit your paper

Know more

The week's pick

Evaluating Text-to-Text Generation from LLMs: A Case Study and Scalable Framework

Ziqiao Ao Juhi Singh Sebastian Antinome

Random Articles

Reseach Article

Unsupervised Technique for Web Data Extraction: Trinity

by Sayali Khodade, Nilav Mukherjee

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 115 - Number 19

Year of Publication: 2015

Authors: Sayali Khodade, Nilav Mukherjee

10.5120/20263-2668

Sayali Khodade, Nilav Mukherjee . Unsupervised Technique for Web Data Extraction: Trinity. International Journal of Computer Applications. 115, 19 ( April 2015), 43-48. DOI=10.5120/20263-2668

@article{ 10.5120/20263-2668,

author = { Sayali Khodade, Nilav Mukherjee },

title = { Unsupervised Technique for Web Data Extraction: Trinity },

journal = { International Journal of Computer Applications },

issue_date = { April 2015 },

volume = { 115 },

number = { 19 },

month = { April },

year = { 2015 },

issn = { 0975-8887 },

pages = { 43-48 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume115/number19/20263-2668/ },

doi = { 10.5120/20263-2668 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T22:55:20.833578+05:30

%A Sayali Khodade

%A Nilav Mukherjee

%T Unsupervised Technique for Web Data Extraction: Trinity

%J International Journal of Computer Applications

%@ 0975-8887

%V 115

%N 19

%P 43-48

%D 2015

%I Foundation of Computer Science (FCS), NY, USA

Abstract

Search engine is a program which searches specific information from huge amount of data . So for getting results in an effective manner and within less time this technique is used. This article is having a technique which depends on two or more web documents which are generated from same server-side template. The technique does not provide any relevant data but searches for shared pattern and separates it into three sub parts then apply different ranking functions and stored it into database. When comparing our technique with other techniques we can see that input documents are not having any negative impact on its effectiveness, also it gives results in less time and in the exact form.

References

Hassan A, Sleiman, Trinity: On Using Trinary Trees for Unsupervised Web Data Extraction IEEE Transactions On Knowledge And Data Engineering, VOL. 26, NO. 6, JUNE 2014.
V. crescenzi, G. Meca, RoadRunner: Towards automatic data extraction from large web sites Technical Report Rt-DIA-64-2001,D. I. A. University Roma Tre, March 2011.
V. Kadam,G. Pakle, A Survey on HTML Structure Aware and Tree Based Web Data Scraping Technique International Journal of Computer Science and Information Technologies, Vol. 5 (2) 2014, 1655-1658 .
S. Rajanandini, M. Mekalai, Quality Analysis in Web Applications to Develop Specification and Duplication Mining, Proceedings of National Conference on New Horizons in IT - NCNHIT 2013.
W. W. Cohen, M. Hurst, and L. S. Jensen, A flexible learning system for wrapping tables and lists in HTML documents,in Proc. 11th Int. Conf. WWW, 2002, pp. 232241.
V. Crescenzi and G. MeccaAutomatic information extraction from large websites,J. ACM, vol. 51, no. 5, pp. 731779, Sept. 2004.
D. Freitag Information extraction from HTML: Application of general machine learning approach,In Proc. 15th Nat/10th Conf. AAAI/IAAI, Menlo Park, CA, USA, 1998, pp. 517523.
A. Arasu and H. Garcia-Molina Extracting structured data from web pages,In Proc. 2003 ACM SIGMOD, San Diego, CA, USA, pp. 337348.
V. Crescenzi, G. Mecca, and P. Merialdo,Road runner: Towards auto-matic data extraction from large web sites,in Proc. 27th Int. Conf. VLDB, Rome, Italy, 2001, pp. 109118.
A. Machanavajjhala, A. S. Iyer, P. Bohannon, and S. MeruguCollective extraction from heterogeneous web lists,in Proc. 4th ACM Int. Conf. WSDM, Hong Kong, China, 2011, pp. 445454.
M. Kayed and C. -H. Chang FiVaTech: Page-level web data extraction from template pages,IEEE Trans. Knowl. Data Eng. , vol. 22, no. 2, pp. 249263, Feb. 2010.
C. -H. Chang, M. Kayed, M. R. Girgis, and K. F. Shaalan A survey of web information extraction systems,IEEE Trans. Knowl. Data Eng. , vol. 18, no. 10, pp. 14111428, Oct. 2006.
C. -H. Chang and S. -C. Lui IEPAD: Information extraction based on pattern discovery,in Proc. 10th Int. Conf. WWW, Hong Kong, China, 2001, pp. 681688
J. L. Hong, E. -G. Siew, and S. EgertonInformationextraction for search engines using fast heuristic techniques ,DataKnowl. Eng. ,Vol. 69, no. 2, pp. 169196, Feb. 2010.
H. A. Sleiman and R. Corchuelo A survey on region extractors from web documents,IEEE Trans. Knowl. Data Eng. , vol. 25, no. 9, pp. 19601981, Sept. 2012.
W. W. Cohen, M. Hurst, and L. S. Jensen A flexible learning system for wrapping tables and lists in HTML documents ,in Proc. 11th Int. Conf. WWW, 2002, pp. 232241.

Index Terms

Computer Science

Information Sciences

Keywords

Web Data extractor Automatic Wrapper Generation Wrapper Unsupervised Technique