Video Frame Interpolation with Stereo Event and Intensity Cameras

Chao Ding1, Mingyuan Lin1, Haijian Zhang1, Jianzhuang Liu2, Lei Yu1,
1Wuhan University, 2Huawei Noah's Ark Lab
Corresponding authors
Input
RIFE
Time Lens
Ours
GT


Abstract

The stereo event-intensity camera setup is widely applied to leverage the advantages of both event cameras with low latency and intensity cameras that capture accurate brightness and texture information. However, such a setup commonly encounters cross-modality parallax that is difficult to be eliminated solely with stereo rectification especially for real-world scenes with complex motions and varying depths, posing artifacts and distortion for existing Event-based Video Frame Interpolation (E-VFI) approaches. To tackle this problem, we propose a novel Stereo Event-based VFI (SE-VFI) network (SEVFI-Net) to generate high-quality intermediate frames and corresponding disparities from misaligned inputs consisting of two consecutive keyframes and event streams emitted between them. Specifically, we propose a Feature Aggregation Module (FAM) to alleviate the parallax and achieve spatial alignment in the feature domain. We then exploit the fused features accomplishing accurate optical flow and disparity estimation, and achieving better interpolated results through flow-based and synthesis-based ways. We also build a stereo visual acquisition system composed of an event camera and an RGB-D camera to collect a new Stereo Event-Intensity Dataset (SEID) containing diverse scenes with complex motions and varying depths. Experiments on public real-world stereo datasets, i.e., DSEC and MVSEC, and our SEID dataset demonstrate that our proposed SEVFI-Net outperforms state-of-the-art methods by a large margin.



Method

Overview of the proposed SEVFI-Net

More Qualitative Results in Real-World Scenarios

Results of Video Frame Interpolation

We compare our SEVFI-Net with one opensourced E-VFI approach, i.e., Time Lens and four stateof-the-art F-VFI methods, DAIN, RIFE, RRIN and Super Slomo.

Input
DAIN
RIFE
RRIN
Super Slomo
Time Lens
Ours
GT

Results of Stereo Matching

Since there is currently no algorithm that combines stereo matching with video frame interpolation. Therefore, we first choose two representative VFI algorithms, i.e., RIFE and Time Lens to generate intermediate frames and then input them along with the event stream into two existing cross-modal stereo matching algorithms, i.e., HSM and SSIE, to compare their performance with our SEVFI-Net.

Images
HSM + RIFE
HSM + Time Lens
Ours
Events
SSIE + RIFE
SSIE + Time Lens
GT


BibTeX


        @article{ding2023video,
        	title={Video Frame Interpolation with Stereo Event and Intensity Camera}, 
        	author={Ding, Chao and Lin, Mingyuan and Zhang, Haijian and Liu, Jianzhuang and Yu, Lei},
        	journal={arXiv preprint arXiv:2307.08228},
        	year={2023}
        	}