International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 182 - Number 50 |
Year of Publication: 2019 |
Authors: Pooja Ajwani, Harshal Arolkar |
10.5120/ijca2019918738 |
Pooja Ajwani, Harshal Arolkar . TubeExtractor: A Crawler and Converter for Generating Research DataSet from YouTube Videos. International Journal of Computer Applications. 182, 50 ( Apr 2019), 14-17. DOI=10.5120/ijca2019918738
With the advent of the internet and e-resources, there has been an exponential growth of data available to the users. Amongst many content providers, YouTube succeeds in securing the second most popular website in the world. The data from YouTube is easily available to the users, due to which many researchers gather YouTube videos as their dataset for research. Searching the required video for data analysis from YouTube is a cumbersome task as YouTube is overloaded with trillions of videos. Researchers thus need to spend a huge amount of time to get required dataset. To save the time taken by researchers for accumulating dataset, an open source application “TubeExtractor” is proposed in this paper. The TubeExtractor application will allow researchers to download the videos and its metadata from YouTube based on the desired parameters provided by the researcher. The TubeExtractor will also provide as an output a plain text file of the downloaded video. This file can be used by the researchers to do additional processing of their choice if required. The keywords to download the videos are provided to the crawler in the form of a document, generated using a keyphrase extractor algorithm. If the vtt (Video Text Tracks) file of the video to be downloaded is available then a plain text file is created using a two-step parser. This TubeExtractor can save enough time of researchers.