Generating Multi-million Data Set using GPGPU Accelerated Models

Call for Paper

March Edition

IJCA solicits high quality original research papers for the upcoming March edition of the journal. The last date of research paper submission is 20 February 2026

Submit your paper

Know more

The week's pick

A Knowledge-Graph–Driven Multimodal Large Model for Semantic Understanding and Controllable Generation of Intangible Cultural Heritage

Jundi Yang Heng Yao

Random Articles

Reseach Article

Generating Multi-million Data Set using GPGPU Accelerated Models

Published on April 2017 by Ghanshyam Verma, Priyanka Tripathi

National Conference on Contemporary Computing

Foundation of Computer Science USA

NCCC2016 - Number 2

April 2017

Authors: Ghanshyam Verma, Priyanka Tripathi

Ghanshyam Verma, Priyanka Tripathi . Generating Multi-million Data Set using GPGPU Accelerated Models. National Conference on Contemporary Computing. NCCC2016, 2 (April 2017), 4-9.

@article{

author = { Ghanshyam Verma, Priyanka Tripathi },

title = { Generating Multi-million Data Set using GPGPU Accelerated Models },

journal = { National Conference on Contemporary Computing },

issue_date = { April 2017 },

volume = { NCCC2016 },

number = { 2 },

month = { April },

year = { 2017 },

issn = 0975-8887,

pages = { 4-9 },

numpages = 6,

url = { /proceedings/nccc2016/number2/27341-6340/ },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Proceeding Article

%1 National Conference on Contemporary Computing

%A Ghanshyam Verma

%A Priyanka Tripathi

%T Generating Multi-million Data Set using GPGPU Accelerated Models

%J National Conference on Contemporary Computing

%@ 0975-8887

%V NCCC2016

%N 2

%P 4-9

%D 2017

%I International Journal of Computer Applications

Abstract

Generating synthetic data set which is realistic as well as sufficiently large has been a cumbersome task for researchers in the past. Several models have been proposed previously, all adopting heterogeneous approaches, in this work the emphasis is on speeding up the compute time of the data set distribution. Here, Uniform, Poisson and Zipf distributions have been studied and approaches with parallel computation model have been proposed. The models have been verified for speedup using CUDA based implementation on NVIDIA Quadro 2000 GPU. A speed up in the range of 2x to 6x was observed for various range of data sets.

References

Hao, W. , Ning, Y. , Chakraborty, P. , Vreeken, J. , Tatti, N. and Ramakrishnan, N. 2016. Generating Realistic Synthetic Population Datasets. arXiv preprint arXiv:1602. 06844.
Cukier, K. 2010. Data, Data Everywhere. Technical Report. The Economist.
Tay, L. 2013. Inside eBay's 90PB data warehouse. Technical Report. ITNews.
Layton, J. 2006. How Amazon Works. Technical Report. HowStuffWorks. com.
Ster, V. D. and Rousseau, H. 2015. Ceph- 30PB Test Report. Test Report. CERN.
DeWitt, S. and Cohen, J. 2010. NASA Goddard Introduces the NASA Center for Climate Simulation. Press Release. Goddard, NASA.
Hoag, J. E. and Thompson, C. W. 2007. A parallel general-purpose synthetic data generator. ACM SIGMOD Record 36, no. 1.
Gray, J. , Sundaresan, P. , Englert, S. , Baclawski, K. and Weinberger, P. J. 1994. Quickly generating billion-record synthetic databases. In ACM SIGMOD Record, vol. 23, no. 2, pp. 243-252.
Nathaniel, B. , Zhao, H. , Du, S. and Stolfo, S. J. 2014. Synthetic Data Generation and Defense in Depth Measurement of Web Applications. In International Workshop on Recent Advances in Intrusion Detection, pp. 234-254. Springer International Publishing.
Shimpi, A. L. and Wilson, D. 2006. Nvidia's GeForce 8800 (G80): GPUs Re-architected for DirectX 10. Technical Report. AnandTech.
Silberstein, M. , Schuster, A. , Geiger, D. , Patney, A. and Owens, J. D. 2008. Efficient computation of sum/products on GPUs through software-managed cache. In Proceedings of the 22nd annual international conference on Supercomputing - ICS '08.
NVIDIA, CUDA. 2009. Architecture: Introduction & Overview. NVIDIA Corporation.

Index Terms

Computer Science

Information Sciences

Keywords

Data Set Generation Synthetic Dataset Zipf Poisson Uniform Distribution Gpu Cuda.