International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 176 - Number 28 |
Year of Publication: 2020 |
Authors: Prerna Agrawal, Bhushan Trivedi |
10.5120/ijca2020920308 |
Prerna Agrawal, Bhushan Trivedi . Unstructured Data Collection from APK files for Malware Detection. International Journal of Computer Applications. 176, 28 ( Jun 2020), 42-45. DOI=10.5120/ijca2020920308
For Malware Detection Machine Learning methods are applied extensively in ascertaining if the given APK file is malware or not. Machine learning methods are found to be less time consuming and less resource consuming compared to non-machine learning-based techniques. We have focused on Machine Learning methods for detecting unknown malware. For detecting the malware a researcher needs to create a dataset of its own. Our dataset generation process includes Android File Collection, Decompilation, and Feature Mining phases. We have already discussed the Android File Collection phase in our previous paper [1]. We have collected 15508 Malware files and 4000 Benign Files using Android File Collection. Android Files contains unstructured data in the form of text and XML files which are complex to process and store. Here our goal is to perform the decompilation of these collected Android files such that we get all the resources as well as the source code in a single instance. We aim to handle the big data in terms of Android Files and process them properly performing the Decompilation. In this paper, we have proposed an available automated solution for decompiling the files that also solves the complexity of handling and processing the big data. We have also discussed our Decompilation phase and presented the structure of the reverse-engineered APK file. We have used an online JADX decompiler [5] for performing the reverse engineering of the APK files.