CFP last date
20 February 2025
Reseach Article

Enhanced Model for Mining Software Repositories

by P.C. Nwosu, F.E. Onuodu, U.A. Okengwu
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 186 - Number 40
Year of Publication: 2024
Authors: P.C. Nwosu, F.E. Onuodu, U.A. Okengwu
10.5120/ijca2024923997

P.C. Nwosu, F.E. Onuodu, U.A. Okengwu . Enhanced Model for Mining Software Repositories. International Journal of Computer Applications. 186, 40 ( Sep 2024), 41-46. DOI=10.5120/ijca2024923997

@article{ 10.5120/ijca2024923997,
author = { P.C. Nwosu, F.E. Onuodu, U.A. Okengwu },
title = { Enhanced Model for Mining Software Repositories },
journal = { International Journal of Computer Applications },
issue_date = { Sep 2024 },
volume = { 186 },
number = { 40 },
month = { Sep },
year = { 2024 },
issn = { 0975-8887 },
pages = { 41-46 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume186/number40/enhanced-model-for-mining-software-repositories/ },
doi = { 10.5120/ijca2024923997 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-09-27T00:46:19.276708+05:30
%A P.C. Nwosu
%A F.E. Onuodu
%A U.A. Okengwu
%T Enhanced Model for Mining Software Repositories
%J International Journal of Computer Applications
%@ 0975-8887
%V 186
%N 40
%P 41-46
%D 2024
%I Foundation of Computer Science (FCS), NY, USA
Abstract

This study presented Enhanced Model for Mining Software Repositories using Supervised Machine Learning technique. The work adopted Object Oriented Analysis and Design (OOAD) methodology for the system design and was implemented using PHP Hypertext Pre-processor scripting language for the purpose of enabling flexibility and user-friendliness in mining source codes from repositories. The database for the model was created using MySQL. The enhanced model utilized K-Nearest Neighbor (k-NN), a supervised machine learning algorithm for data classification to eliminate voluminous comments from source codes in order to reduce bulkiness. The results and performance evaluation of the existing and enhanced models were illustrated. The pre-defined parameters for both models comprised of the number of iterations for mining, the time taken to generate the codes in seconds and number of generated lines of codes. Seven iterations were carried out for both models in which the existing model generated a total of 56 lines of codes in 2.322 seconds, while the developed model generated a total of 93 lines of well-defined lines of codes in 0.017 seconds. Therefore, the results obtained clearly showed that the model performed much better than the existing model in terms of speed, accuracy and extraction of well defined codes. The model could be beneficial to data miners, programmers, software engineers, project managers of large industrial environments as well as researchers because relevant information from the study can be applied to problem-solving.

References
  1. Allamanis, M. (2019). The Adverse Effects of Code Duplication in Machine Learning Models of Code. Proceedings of the ACM SIGPLAN International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software, 143-153.
  2. Chaturvedi K., Singh V., and Singh P. (2013). Tools in Mining Software Repositories. Proceedings of the 13th International Conference on Computational Science and Its Applications, IEEE Press, 1, 89 – 98.
  3. Güemes-Peña, D., López-Nozal, C., Marticorena-Sánchez, R., and Maudes-Raedo, J. (2018). Emerging Topics in Mining Software Repositories. Progress in Artificial Intelligence, 7(3), 237–247.
  4. Ieva, C., Gotlieb, A., Kaci, S. and Lazaar, L. (2018). Discovering Program Topoi via Agglomerative Clustering. Proceedings of the Thirty-Second IAAI/AAAI Conference on Innovative Applications of Artificial Intelligence, IEEE Transactions on Reliability, 69(3), 758-770.
  5. Ieva C, Gotlieb A., Kaci S. and Lazaar L. (2019). Deploying Smart Program Understanding on a Large Code Base. Proceeding of the 1st IEEE International Conference on Artificial Intelligence Testing (AITest), San Francisco East Bay, CA, USA. 73 - 80.
  6. Kim, K. (2021). Normalized Class Coherence Change-Based KNN for Classification of Imbalanced Data. Pattern Recognition, 120, 108126.
  7. Kuhkan M. (2016). A Method to Improve the Accuracy of K-Nearest Neighbor Algorithm. International Journal of Computer Engineering and Information Technology, 8(6), 90-95.
  8. Meqdadi, O. and Alhindawi, N. (2019). Mining Software Repositories for Adaptive Change Commits Using Machine Learning Techniques. Information and Software Technology, 109, 80-91.
  9. Olatunji S. O., Idrees S. U., Al-Ghamdi Y. S. and Al-Ghamdi J. S. A. (2010). Mining Software Repositories: A Comparative Analysis. International Journal of Computer Science and Security (IJCSNS), 10(8), 161–174.
  10. Ott, J., Atchison, A. and Linstead, E. J. (2019). Exploring the Applicability of Low-Shot Learning in Mining Software Repositories. Journal of Big Data 6(35), 1-10.
  11. Pham H. S., Nijssen S. and Mens K. (2019). Mining Patterns in Source Code Using Tree Mining Algorithms. Proceedings of the 22nd International Conference on Discovery Science, Split, Croatia. Lecture Notes in Artificial Intelligence, 11828, 471–480.
  12. Potvin, R. and Levenberg, J. (2016). Why Google Stores Billions of Lines of Code in a Single Repository. Communications of the ACM, 59(7), 78-87.
  13. Raikwal, J. S. and Saxena, K. (2012). Performance Evaluation of SVM and K-Nearest Neighbor Algorithm over Medical Dataset. International Journal of Computer Applications, 50(14), 35-39.
  14. Ram-Kumar, R. P., Polepaka, S., Lazarus S. F. and Krishna, D. V. (2019). An Insight on Machine Learning Algorithms and Its Applications. International Journal of Innovative Technology and Exploring Engineering (IJITEE), 8(11S2), 432-436.
  15. Siddiqui and Ahmad (2019). Mining Software Repositories for Software Metrics (MSR-SM): Conceptual Framework. International Journal of Innovative Technology and Exploring Engineering (IJITEE), 8(10), 4173-4177.
  16. Sheikhi, S. and Kheirabadi, M. T. (2020). A Novel Scheme for Improving Accuracy of KNN Classification Algorithm Based on the New Weighting Technique and Stepwise Feature Selection. Journal of Information Technology Management, 12(4), 90-103.
  17. Thomas S. W., Hassan A. E. and Blostein D. (2014). Mining Unstructured Software Repositories. Evolving Software Systems, 139-162.
  18. Trautsch, A., Trautsch, F., Herbold, S., Ledel, B., and Grabowski, J. (2020). The Smart-Shark Ecosystem for Software Repository Mining. Proceeding of the 42nd International Conference on Software Engineering (ICSE). ACM, New York, USA, 24-28.
  19. Upadhyaya, G. and Rajan, H. (2018). On Accelerating Source Code Analysis at Massive Scale, IEEE Transactions on Software Engineering, 44(7), 669-688.
  20. Vadim K. (2018). Overview of Different Approaches to Solving Problems of Data Mining. Procedia Computer Science, 123, 234–239.
Index Terms

Computer Science
Information Sciences

Keywords

Software Repositories Source Code Data Mining Machine Learning KNN