CFP last date
20 January 2025
Reseach Article

A Query Classification System based on Snippet Similarity for a One-Click Search

by Tatsuya Tojima, Takashi Yukawa
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 82 - Number 1
Year of Publication: 2013
Authors: Tatsuya Tojima, Takashi Yukawa
10.5120/14077-2146

Tatsuya Tojima, Takashi Yukawa . A Query Classification System based on Snippet Similarity for a One-Click Search. International Journal of Computer Applications. 82, 1 ( November 2013), 1-8. DOI=10.5120/14077-2146

@article{ 10.5120/14077-2146,
author = { Tatsuya Tojima, Takashi Yukawa },
title = { A Query Classification System based on Snippet Similarity for a One-Click Search },
journal = { International Journal of Computer Applications },
issue_date = { November 2013 },
volume = { 82 },
number = { 1 },
month = { November },
year = { 2013 },
issn = { 0975-8887 },
pages = { 1-8 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume82/number1/14077-2146/ },
doi = { 10.5120/14077-2146 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T21:56:37.506187+05:30
%A Tatsuya Tojima
%A Takashi Yukawa
%T A Query Classification System based on Snippet Similarity for a One-Click Search
%J International Journal of Computer Applications
%@ 0975-8887
%V 82
%N 1
%P 1-8
%D 2013
%I Foundation of Computer Science (FCS), NY, USA
Abstract

This paper proposes a query classification system for a one-click search system that uses feature vectors based on snippet similarity. The proposed system targets the NTCIR-10 1CLICK-2 query classification subtask and classifies queries in Japanese and English into eight predefined classes by using support vector machines (SVMs). In the NTCIR-9 and NTCIR-10 tasks, most participants used complex features or rules that depend strongly on language characteristics. The authors propose a new method that uses feature vectors created by using snippet similarities instead of the above mentioned features. In the proposed method, feature vectors have fewer dimensions, provide better generalization, lower language dependency, and reduced computer resources. This method achieved accuracies of 0. 93 for a Japanese task and 0. 91 for an English task.

References
  1. Chih-Chung Chang and Chih-Jen Lin. Libsvm: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology (TIST), 2(3):27, 2011.
  2. Alan H Cheetham and Joseph E Hazel. Binary (presence-absence) similarity coefficients. Journal of Paleontology, pages 1130–1136, 1969.
  3. Corinna Cortes and Vladimir Vapnik. Support-vector networks. Machine learning, 20(3):273–297, 1995.
  4. Dan Ionita, Niek Tax, and Djoerd Hiemstra. An api-based search system for one click access to information. In Proceedings of the 10th NTCIR Conference, 2013.
  5. Makoto P Kato, Matthew Ekstrand-Abueg, Virgil Pavlu, Tetsuya Sakai, Takehiro Yamamoto, and Mayu Iwata. Overview of the ntcir-10 1click-2 task. In Proceedings of the 10th NTCIR Conference, 2013.
  6. Makoto P Kato, Meng Zhao, Kosetsu Tsukuda, Yoshiyuki Shoji, Takehiro Yamamoto, Hiroaki Ohshima, and K Tanakai. Information extraction based approach for the ntcir-9 1click task. Proceedings of NTCIR-9, 2011.
  7. Taku Kudo. Mecab:yet another japanese dependency structure analyzer. http://mecab. googlecode. com/svn/trunk/mecab/doc/index. html.
  8. Taku Kudo, Kaoru Yamamoto, and Yuji Matsumoto. Applying conditional random fields to japanese morphological analysis. In EMNLP, volume 4, pages 230–237, 2004.
  9. Tomohiro Manabe, Kosetsu Tsukuda, Kazutoshi Umemoto, Yoshiyuki Shoji, Makoto P Kato, Takehiro Yamamoto, Meng Zhao, Soungwoong Yoon, Hiroaki Ohshima, and Katsumi Tanaka. Information extraction based approach for the ntcir-10 1click-2 task. In Proceedings of the 10th NTCIR Conference, 2013.
  10. Yahoo Japan Corporation. Yahoo japan web search api. http://developer. yahoo. co. jp/webapi/search/.
  11. Hajime Morita, Takuya Makino, Tetsuya Sakai, Hiroya Takamura, and Manabu Okumura. Ttoku summarization based systems at ntcir-9 1click task. Proceedings of NTCIR-9, 2011.
  12. Kazuya Narita, Tetsuya Sakai, Zhicheng Dou, and Song Young-In. Msra at ntcir-10 1click-2. In Proceedings of the 10th NTCIR Conference, 2013.
  13. Naoki Orii, Young-In Song, and Tetsuya Sakai. Microsoft research asia at the ntcir-9 1click task. Proceedings of NTCIR-9, 2011.
  14. Daniel E. Rose and Danny Levinson. Understanding user goals in web search. In Proceedings of the 13th international conference on World Wide Web, WWW '04, pages 13–19, New York, NY, USA, 2004. ACM.
  15. Tetsuya Sakai, Makoto P Kato, and Young-In Song. Overview of ntcir-9 1click. Proceedings of NTCIR-9, 2011.
  16. Helmut Schmid. Treetagger - a language independent part-of-speech tagger. http://www. cis. uni-muenchen. de/ schmid/tools/TreeTagger/.
  17. Helmut Schmid. Probabilistic part-of-speech tagging using decision trees. In Proceedings of international conference on new methods in language processing, volume 12, pages 44–49. Manchester, UK, 1994.
  18. Helmut Schmid. Improvements in part-of-speech tagging with an application to german. In In Proceedings of the ACL SIGDAT-Workshop. Citeseer, 1995.
  19. NTCIR: NII Testbeds and Community for Information access Research. http://ntcir. nii. ac. jp/about/.
  20. Tatsuya Tojima and Takashi Yukawa. Optimization of query classification based on snippet similarities for a 1 click search system. In Proceedings of 3rd International Symposium on Engineering, Energy and Environment, November 17-20 2013. accepted to be presented.
  21. Tatsuya Tojima and Takashi Yukawa. Query classification system based on snippet summary similarities for ntcir-10 1click-2 task. In Proceedings of the 10th NTCIR Conference, 2013.
  22. Masaharu Yoshioka. Query classification by using named entity recognition systems and clue keywords. In Proceedings of the 10th NTCIR Conference, 2013.
Index Terms

Computer Science
Information Sciences

Keywords

Query Classification Dimension Reduction Intent Mobile