International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 184 - Number 22 |
Year of Publication: 2022 |
Authors: R. Geetharamani, Eashwar P., Srikarthikeyan M.K., Varoon S.B. |
10.5120/ijca2022922259 |
R. Geetharamani, Eashwar P., Srikarthikeyan M.K., Varoon S.B. . Hybrid Approach for Video Interpolation. International Journal of Computer Applications. 184, 22 ( Jul 2022), 16-22. DOI=10.5120/ijca2022922259
Video interpolation is a form of video processing in which intermediate frames are generated between existing ones by means of interpolation. The research aims to synthesize several frames in the middle of two adjacent frames of the original video. Interpolation of the frames can be done in either static or dynamic mode. The dynamic approach identifies the number of frames to be interpolated between the reference frames. The reference frames are passed onto three deep learning networks namely Synthesis, Warping, and Refinement where each network performs different functionality to tackle blurring, occlusion and arbitrary non-linear motion. Synthesis is a kernel-based approach where the interpolation is modeled as local convolution over the reference frames and uses a single UNet. Warping makes use of Optical Flow information to back-warp the reference frames and generates the interpolated frame using 2 UNets. Refinement module works with the help of optical flow and warped frames to compute weighted maps which enhance interpolation using 2 UNets and GAN. The raw interpolated output of the deep learning networks yielded a PSNR of 38.87 dB and an SSIM of 97.22% was achieved. In order to further enhance the results, this research combined these deep learning approaches followed by post-processing. The raw interpolated frames are color corrected and fused to form a new frame. Color correction is the process of masking the difference between the interpolated frame and the ground truth over the interpolated frame. Fusion ensures that the maximum pixel value from each input frame is present in the fused frame. Voting is applied on the color corrected frames from the three networks and the fused frame. This voting follows a per-pixel strategy and selects the best pixel from each of the interpolated frames. Datasets used to train and test these modules are DAVIS(Densely Annotated VIdeo Segmentation), Adobe 240, and Vimeo. This hybrid interpolation technique was able to achieve the highest PSNR of 58.98 dB and SSIM of 99.96% which is better than the results of base paper that achieved a PSNR of 32.49 dB and SSIM of 92.7%.