TL;DR: How can we detect any moving target from an airborne platform using only motion cues? We develop a network that detect any salient object regardless its shape and appearance and estimate its motion without the need of any human labelling, relying purely on self-supervised features from Vision Transformers (ViT) and optical flow.
Frame-based vision sensors on aerial platforms face bandwidth-latency challenges. On the contrary, event cameras offer promise with their high temporal resolution and low power needs. Existing methods for event-based motion segmentation often require manual labeling or scene-specific parameter tuning. Our approach addresses these issues by employing self-supervised transformers on event data and optical flow, eliminating the need for human annotations and reducing parameter tuning. We evaluate our method extensively on diverse datasets using an HD event camera on a dynamic aerial platform in urban settings, demonstrating superior performance compared to existing methods in handling various motion types and multiple moving objects
@misc{arja_motionseg_2024,
title = {Motion Segmentation for Neuromorphic Aerial Surveillance},
url = {http://arxiv.org/abs/2405.15209},
publisher = {arXiv},
author = {Arja, Sami and Marcireau, Alexandre and Afshar, Saeed and Ramesh, Bharath and Cohen, Gregory},
month = oct,
year = {2024},
}