CFP last date
20 February 2025
Reseach Article

Optimal Load Factor for Approximate Nearest Neighbor Search under Exact Euclidean Locality Sensitive Hashing

by Ruben Buaba, Abdollah Homaifar, Eric Kihn
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 69 - Number 21
Year of Publication: 2013
Authors: Ruben Buaba, Abdollah Homaifar, Eric Kihn
10.5120/12096-8258

Ruben Buaba, Abdollah Homaifar, Eric Kihn . Optimal Load Factor for Approximate Nearest Neighbor Search under Exact Euclidean Locality Sensitive Hashing. International Journal of Computer Applications. 69, 21 ( May 2013), 22-31. DOI=10.5120/12096-8258

@article{ 10.5120/12096-8258,
author = { Ruben Buaba, Abdollah Homaifar, Eric Kihn },
title = { Optimal Load Factor for Approximate Nearest Neighbor Search under Exact Euclidean Locality Sensitive Hashing },
journal = { International Journal of Computer Applications },
issue_date = { May 2013 },
volume = { 69 },
number = { 21 },
month = { May },
year = { 2013 },
issn = { 0975-8887 },
pages = { 22-31 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume69/number21/12096-8258/ },
doi = { 10.5120/12096-8258 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T21:30:55.321913+05:30
%A Ruben Buaba
%A Abdollah Homaifar
%A Eric Kihn
%T Optimal Load Factor for Approximate Nearest Neighbor Search under Exact Euclidean Locality Sensitive Hashing
%J International Journal of Computer Applications
%@ 0975-8887
%V 69
%N 21
%P 22-31
%D 2013
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Locality Sensitive Hashing (LSH) is an index-based data structure that allows spatial item retrieval over a large dataset. The performance measure, ?, has significant effect on the computational complexity and memory space requirement to create and store items in this data structure respectively. The minimization of ? at a specific approximation factor c, is dependent on the load factor, ?. Over the years,?=4has been used by researchers. In this paper, we demonstratethat the choice of?=4does not guarantee low computational complexity and low memory space of the data structure under the LSH scheme. To guarantee low computational complexity and low memory space, we propose?=5. Experiments on the Defense Meteorological Satellite Program imagery datasethave shown that?=5saves more than 75%on memory space; cuts the computational complexity by more than 70%andanswers query two times faster on the average compared to that of?=4.

References
  1. Arya, S. , Mount, D. M. , Netanyahu, N. S. , Silverman, R. , and Wu, A. Y. 1998. "An optimal algorithm for approximate nearest neighbor searching fixed dimensions," J. ACM, vol. 45, no. 6, pp. 891-923, 1998).
  2. Flickner, M. , Sawhney, H. , Niblack, W. , Ashley, J. , Qian, H. , Dom, B. , Gorkani, M. , Hafner, J. , Lee, D. , Petkovic, D. , Steele, D. , and Yanker, P. 1995. "Query by image and video content: the QBIC system," Computer, vol. 28, no. 9, pp. 23-32, 1995).
  3. Fayyad, U. M. 1996. Advances in knowledge discovery and data mining: AAAI Press.
  4. Lin, K. I. , Jagadish, H. V. , and Faloutsos, C. 1994. "The TV-tree: an index structure for high-dimensional data," The VLDB Journal, vol. 3, no. 4, pp. 517-542, 1994).
  5. Roussopoulos, N. , Kelley, S. , and Vincent, F. 1995. "Nearest neighbor queries," in Proceedings of the 1995 ACM SIGMOD international conference on Management of data, San Jose, California, United States, pp. 71-79.
  6. White, D. A. , and Jain, R. "Similarity indexing with the SS-tree," Data Engineering, 1996. Proceedings of the Twelfth International Conference on. pp. 516-523.
  7. Berchtold, S. , Keim, D. A. , and Kriegel, H. -P. 1996. "The X-tree: An Index Structure for High-Dimensional Data," in Proceedings of the 22th International Conference on Very Large Data Bases, pp. 28-39.
  8. Berchtold, S. , Böhm, C. , Keim, D. A. , and Kriegel, H. -P. 1997. "A cost model for nearest neighbor search in high-dimensional data space," in Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems, Tucson, Arizona, United States, pp. 78-86.
  9. Cleary, J. G. 1979. "Analysis of an Algorithm for Finding Nearest Neighbors in Euclidean Space," ACM Trans. Math. Softw. , vol. 5, no. 2, pp. 183-192, 1979).
  10. Bentley, J. L. , Weide, B. W. , and Yao, A. C. 1980. "Optimal Expected-Time Algorithms for Closest Point Problems," ACM Trans. Math. Softw. , vol. 6, no. 4, pp. 563-580, 1980).
  11. Friedman, J. H. , Bentley, J. L. , and Finkel, R. A. 1977. "An Algorithm for Finding Best Matches in Logarithmic Expected Time," ACM Trans. Math. Softw. , vol. 3, no. 3, pp. 209-226, 1977).
  12. Sproull, R. 1991. "Refinements to nearest-neighbor searching in k -dimensional trees," Algorithmica, vol. 6, no. 1, pp. 579-589, 1991).
  13. Arya, S. , and Mount, D. M. 1995. "Approximate range searching," in Proceedings of the eleventh annual symposium on Computational geometry, Vancouver, British Columbia, Canada, pp. 172-181.
  14. Preparata, F. P. , and Shamos, M. I. 1985. Computational Geometry: An Introduction: Springer-Verlag.
  15. Edelsbrunner, H. 2004. Algorithms in Combinatorial Geometry: Springer.
  16. de Berg, M. , Cheong, O. , van Kreveld, M. , and Overmars, M. 2008. Computational Geometry: Algorithms and Applications: Springer.
  17. Yao, A. C. , and Yao, F. F. 1985. "A general approach to d-dimensional geometric queries," in Proceedings of the seventeenth annual ACM symposium on Theory of computing, Providence, Rhode Island, United States, pp. 163-168.
  18. Clarkson, K. L. 1988. "A randomized algorithm for closest-point queries," SIAM J. Comput. , vol. 17, no. 4, pp. 830-847, 1988).
  19. Agarwal, P. K. , and Matoušek, J. 1993. "Ray shooting and parametric search," SIAM J. Comput. , vol. 22, no. 4, pp. 794-806, 1993).
  20. Meiser, S. 1993. "Point location in arrangements of hyperplanes," Inf. Comput. , vol. 106, no. 2, pp. 286-303, 1993).
  21. Gionis, A. , Indyk, P. , and Motwani, R. 1999. "Similarity Search in High Dimensions via Hashing," in Proceedings of the 25th International Conference on Very Large Data Bases, pp. 518-529.
  22. Weber, R. , Schek, H. J. , and Blott, S. 1998. "A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces," in Proceedings of the 24rd International Conference on Very Large Data Bases, pp. 194-205.
  23. Arya, S. , Mount, D. M. , Netanyahu, N. S. , Silverman, R. , and Wu, A. 1994. "An optimal algorithm for approximate nearest neighbor searching," in Proceedings of the fifth annual ACM-SIAM symposium on Discrete algorithms, Arlington, Virginia, United States, pp. 573-582.
  24. Har-Peled, S. "A Replacement for Voronoi Diagrams of Near Linear Size," 42nd IEEE symposium on Foundations of Computer Science. pp. 94-94.
  25. Indyk, P. , and Motwani, R. 1998. "Approximate nearest neighbors: towards removing the curse of dimensionality," in Proceedings of the thirtieth annual ACM symposium on Theory of computing, Dallas, Texas, United States, pp. 604-613.
  26. Kleinberg, J. M. 1997. "Two algorithms for nearest-neighbor search in high dimensions," in Proceedings of the twenty-ninth annual ACM symposium on Theory of computing, El Paso, Texas, United States, pp. 599-608.
  27. Kushilevitz, E. , Ostrovsky, R. , and Rabani, Y. 1998. "Efficient search for approximate nearest neighbor in high dimensional spaces," in Proceedings of the thirtieth annual ACM symposium on Theory of computing, Dallas, Texas, United States, pp. 614-623.
  28. Beyer, K. S. , Goldstein, J. , Ramakrishnan, R. , and Shaft, U. 1999. "When Is ''Nearest Neighbor'' Meaningful?," in Proceedings of the 7th International Conference on Database Theory, pp. 217-235.
  29. Hinneburg, A. , Aggarwal, C. C. , and Keim, D. A. 2000. "What Is the Nearest Neighbor in High Dimensional Spaces?," in Proceedings of the 26th International Conference on Very Large Data Bases, pp. 506-515.
  30. Datar, M. , Immorlica, N. , Indyk, P. , and Mirrokni, V. S. 2004. "Locality-sensitive hashing scheme based on p-stable distributions," in Proceedings of the twentieth annual symposium on Computational geometry, Brooklyn, New York, USA, pp. 253-262.
  31. Andoni, A. , and Indyk, P. 2006. "Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions," in Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science, pp. 459-468.
  32. Slaney, M. , and Casey, M. 2008. "Locality-Sensitive Hashing for Finding Nearest Neighbors [Lecture Notes]," Signal Processing Magazine, IEEE, vol. 25, no. 2, pp. 128-131, 2008).
  33. Cormen, T. H. , Leiserson, C. E. , Rivest, R. L. , and Stein, C. 2001. Introduction to Algorithms, Second Edition, p. ^pp. 224 -252: MIT Press.
  34. Nolan, J. 2007. Stable Distributions: Models for Heavy-Tailed Data: Springer Verlag.
  35. Buhler, J. 2001. "Efficient large-scale sequence comparison by locality-sensitive hashing," Bioinformatics, vol. 17, no. 5, pp. 419-428, 2001).
  36. Buhler, J. 2002. "Provably sensitive Indexing strategies for biosequence similarity search," in Proceedings of the sixth annual international conference on Computational biology, Washington, DC, USA, pp. 90-99.
  37. Buhler, J. , and Tompa, M. 2002. "Finding motifs using random projections," J Comput Biol, vol. 9, no. 2, pp. 225-242, 2002).
  38. Ouyang, Z. , Memon, N. D. , Suel, T. , and Trendafilov, D. 2002. "Cluster-Based Delta Compression of a Collection of Files," in Proceedings of the 3rd International Conference on Web Information Systems Engineering, pp. 257-268.
  39. Shivakumar, N. 1999. "Detecting digital copyright violations on the internet," Stanford University.
  40. Cheng, Y. "MACS: music audio characteristic sequence indexing for similarity retrieval," Applications of Signal Processing to Audio and Acoustics, 2001 IEEE Workshop on the. pp. 123-126.
  41. Zolotarev, V. M. 1986. One-Dimensional Stable Distributions: American Mathematical Society.
  42. Gebril, M. , Buaba, R. , Homaifar, A. , and Kihn, E. "Structural indexing of satellite images using automatic classification," Aerospace Conference, 2011 IEEE. pp. 1-7.
  43. Buaba, R. , Homaifar, A. , Gebril, M. , and Kihn, E. "Satellite image retrieval application using Locality Sensitive Hashing in L2-space," Aerospace Conference, 2011 IEEE. pp. 1-7.
  44. Buaba, R. , Homaifar, A. , Gebril, M. , Kihn, E. , and Zhizhin, M. 2011. "Satellite image retrieval using low memory locality sensitive hashing in Euclidean space," Earth Science Informatics, vol. 4, no. 1, pp. 17-28, (2011/03/01 2011).
Index Terms

Computer Science
Information Sciences

Keywords

Approximate Nearest Neighbor Exact Nearest Neighbor ApproximationFactor Performance Measure Optimal Load Factor