CFP last date
20 May 2025
Reseach Article

Evolution of Data Mining: From Statistical Foundations to Big Data and Deep Learning

by Rajiv Chooramun
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 187 - Number 4
Year of Publication: 2025
Authors: Rajiv Chooramun
10.5120/ijca2025924837

Rajiv Chooramun . Evolution of Data Mining: From Statistical Foundations to Big Data and Deep Learning. International Journal of Computer Applications. 187, 4 ( May 2025), 12-20. DOI=10.5120/ijca2025924837

@article{ 10.5120/ijca2025924837,
author = { Rajiv Chooramun },
title = { Evolution of Data Mining: From Statistical Foundations to Big Data and Deep Learning },
journal = { International Journal of Computer Applications },
issue_date = { May 2025 },
volume = { 187 },
number = { 4 },
month = { May },
year = { 2025 },
issn = { 0975-8887 },
pages = { 12-20 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume187/number4/evolution-of-data-mining-from-statistical-foundations-to-big-data-and-deep-learning/ },
doi = { 10.5120/ijca2025924837 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2025-05-17T02:45:52.928807+05:30
%A Rajiv Chooramun
%T Evolution of Data Mining: From Statistical Foundations to Big Data and Deep Learning
%J International Journal of Computer Applications
%@ 0975-8887
%V 187
%N 4
%P 12-20
%D 2025
%I Foundation of Computer Science (FCS), NY, USA
Abstract

This article traces the historical development of data mining, outlining its evolution through four phases. It begins with the inception of statistical techniques in the 18th and 19th centuries, progresses through advancements in computer technology and artificial neural networks in the mid-20th century, and moves on to the establishment of foundational concepts and algorithms in the final decades of the 20th century. Finally, it addresses the incorporation of big data and deep learning technologies in the 21st century. A comprehensive literature review was conducted to explore the historical progression of data mining. The study examines contributions from early statistical analysis, the impact of electronic computers and database systems, the formalization of data mining concepts and algorithms during the 1990s, and recent advancements driven by big data and deep learning. Each phase has significantly advanced data mining methodologies. Early statistical analysis by figures such as Bayes and Gauss provided foundational groundwork. The advent of electronic computers and database systems enhanced data processing capabilities. The formalization of data mining in the 1990s, marked by ‘knowledge discovery in databases and algorithms like support vector machines, expanded its applications. In the 21st century, big data and deep learning have further elevated data mining, solidifying its importance in data science and diverse fields. While this review is limited by the scope of existing literature and historical context, it provides a comprehensive overview of data mining’s dynamic evolution and its critical role in extracting valuable insights from datasets. Future research could explore emerging developments and applications in this rapidly evolving field.

References
  1. Nisbet, R., J. Elder, and G.D. Miner, Handbook of statistical analysis and data mining applications. 2018: Academic press.
  2. Chen, G., et al., A Review of the Development and Future Trends of Data Mining Tools. Innovative Computing: IC 2020, 2020: p. 113-119.
  3. Sharma, M., Data mining: A literature survey. International Journal of Emerging Research in Management & Technology, 2014. 3(2).
  4. Dhar, V., Data science and prediction. Communications of the ACM, 2013. 56(12): p. 64-73.
  5. Piatetsky-Shapiro, G., Discovery, analysis, and presentation of strong rules. Knowledge Discovery in Data-bases, 1991: p. 229-248.
  6. Linoff, G.S. and M.J. Berry, Data mining techniques: for marketing, sales, and customer relationship management. 1997: John Wiley & Sons.
  7. Tukey, J.W., Exploratory data analysis. Vol. 2. 1977: Reading, MA.
  8. Fradkov, A.L., Early history of machine learning. IFAC-PapersOnLine, 2020. 53(2): p. 1385-1390.
  9. Delipetrev, B., C. Tsinaraki, and U. Kostic, Historical evolution of artificial intelligence. 2020.
  10. Mitchell, T.M., Machine learning and data mining. Communications of the ACM, 1999. 42(11): p. 30-36.
  11. Teng, X. and Y. Gong. Research on application of machine learning in data mining. in IOP conference series: materials science and engineering. 2018. IOP Publishing.
  12. Wu, X., et al., Data mining with big data. IEEE transactions on knowledge and data engineering, 2013. 26(1): p. 97-107.
  13. Che, D., M. Safran, and Z. Peng. From big data to big data mining: challenges, issues, and opportunities. in Database Systems for Advanced Applications: 18th International Conference, DASFAA 2013, International Workshops: BDMA, SNSM, SeCoP, Wuhan, China, April 22-25, 2013. Proceedings 18. 2013. Springer.
  14. Han, J., J. Pei, and H. Tong, Data mining: concepts and techniques. 2022: Morgan kaufmann.
  15. Gauss, C.F., The theory of the combination of observations least subject to errors. 1795: H. W. Miller.
  16. Galton, F., Typical laws of heredity. Nature, 1877. 15: p. 492-495.
  17. Pearson, K., Mathematical contributions to the theory of evolution—On a form of spurious correlation which may arise when indices are used in the measurement of organs. Proceedings of the Royal Society of London, 1896. 60(1): p. 489-498.
  18. Babbage, C., On the economy of machinery and manufactures. 1832: Charles Knight.
  19. Snow, J., On the mode of communication of cholera. 1855: John Churchill.
  20. Shewhart, W.A., Economic control of quality of manufactured product. 1931: D. Van Nostrand Company, Inc.
  21. Han, J., M. Kamber, and J. Pei, Data Mining: Concepts and. Techniques, Waltham: Morgan Kaufmann Publishers, 2012.
  22. Pitts, W. and W.S. McCulloch, A logical calculus of the ideas immanent in nervous activity. Bulletin of Mathematical Biophysics, 1943. 5(4): p. 115-133.
  23. Cortes, C. and V. Vapnik, Support-vector networks. Machine learning, 1995. 20: p. 273-297.
  24. Mitchell, T.M. and T.M. Mitchell, Machine learning. Vol. 1. 1997: McGraw-hill New York.
  25. 26. Breiman, L., Classification and regression trees. 1984: Routledge.
  26. Goldstine, H.H., The computer from Pascal to von Neumann. 1993: Princeton University Press.
  27. Bashe, C.J., et al., IBM's early computers. 1986: MIT press.
  28. Riordan, M. and L. Hoddeson, Crystal fire: The birth of the information age. 1997: WW Norton & Company.
  29. Ceruzzi, P.E., A history of modern computing. 2003: MIT press.
  30. Nie, N.H., D.H. Bent, and C.H. Hull, SPSS: Statistical Package for the Sciences. 1970: McGraw-Hill.
  31. Jöreskog, K., A general method for estimating a linear structural equation system. ETS Research Bulletin Series, 1970. 1970(2): p. i-41.
  32. Raykov, T. and G.A. Marcoulides, A first course in structural equation modeling. 2012: routledge.
  33. Codd, E.F., A relational model of data for large shared data banks. Communications of the ACM, 1970. 13(6): p. 377-387.
  34. Vapnik, V. and A. Chervonenkis, On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and Its Applications, 1971. 16(2): p. 264-280.
  35. Samuel, A.L., Some studies in machine learning using the game of checkers. IBM Journal of research and development, 1959. 3(3): p. 210-229.
  36. Hastie, T., Tibshirani, R., and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction. 2009.
  37. Jolliffe, I.T., Principal component analysis for special types of data. 2002: Springer.
  38. Everitt, B. and T. Hothorn, An introduction to applied multivariate analysis with R. 2011: Springer Science & Business Media.
  39. Hastie, T., et al., The elements of statistical learning: data mining, inference, and prediction. Vol. 2. 2009: Springer.
  40. Morgan, J.N. and J.A. Sonquist, Problems in the analysis of survey data, and a proposal. Journal of the American statistical association, 1963. 58(302): p. 415-434.
  41. Breiman, L., Classification and regression trees. 2017: Routledge.
  42. Azevedo, A., Data mining and knowledge discovery in databases, in Advanced methodologies and technologies in network architecture, mobile computing, and data analytics. 2019, IGI Global. p. 502-514.
  43. Piateski, G. and W. Frawley, Knowledge Discovery in Databases.-MIT Press, Cambridge. MA, USA, 1991.
  44. Salzberg, S.L., On comparing classifiers: Pitfalls to avoid and a recommended approach. Data mining and knowledge discovery, 1997. 1: p. 317-328.
  45. LeCun, Y., Y. Bengio, and G. Hinton, Deep learning. nature, 2015. 521(7553): p. 436-444.
  46. Laney, D., 3D data management: Controlling data volume, velocity, and variety. META Group, 2001.
  47. Davenport, T.H. and D. Patil, Data Scientist: The Sexiest Job of the 21st Century-A new breed of professional holds the key to capitalizing on big data opportunities. But these specialists aren't easy to find—And the competition for them is fierce. Harvard Business Review, 2012: p. 70.
  48. Sarker, I.H., Deep learning: a comprehensive overview on techniques, taxonomy, applications and research directions. SN Computer Science, 2021. 2(6): p. 420.
  49. Pires, P.B., J.D. Santos, and I.V. Pereira, Artificial Neural Networks: History and State of the Art. Encyclopedia of Information Science and Technology, Sixth Edition, 2024: p. 1-25.
  50. 51. Chhabra, P. and S. Goyal. A Thorough Review on Deep Learning Neural Network. in 2023 International Conference on Artificial Intelligence and Smart Communication (AISC). 2023. IEEE.
  51. Schmidhuber, J., Deep learning in neural networks: An overview. Neural networks, 2015. 61: p. 85-117.
  52. Halevi, G. and H.F. Moed Dr, The evolution of big data as a research and scientific topic: Overview of the literature. Research trends, 2012. 1(30): p. 2.
  53. Vignesh, P., et al. Research in Big Data Analytics Utilizing Simulations. in 2023 2nd International Conference on Applied Artificial Intelligence and Computing (ICAAIC). 2023. IEEE.
  54. Jebbu, A., R. Kumari, and T. Pati, Big data analytics: Concepts and techniques. 2008: McGraw-Hill.
  55. Saeed, I. and R. KUMAR, Challenges and Emerging Patterns in Big Data Analytics. Authorea Preprints, 2023.
  56. Iskamto, D., Data science: Trends and its role in various fields. Adpebi International Journal of Multidisciplinary Sciences, 2023. 2(2): p. 165-172.
  57. 58. Alvarado, R.C., Data Science from 1963 to 2012. arXiv preprint arXiv:2311.03292, 2023.
  58. O’Regan, G., Introduction to Data Science, in Mathematical Foundations of Software Engineering: A Practical Guide to Essentials. 2023, Springer. p. 385-398.
Index Terms

Computer Science
Information Sciences
Data Mining
Machine Learning
Artificial Intelligence
Algorithms
Theory

Keywords

Data mining Artificial Intelligence Machine Learning Big Data Knowledge Discovery