Bridging the gap between complex scientific research and the curious minds eager to explore it.

Computer Science, Computer Vision and Pattern Recognition

Improving Event Detection in Videos with Robust Temporal Normalization and Adaptive Thresholding

Improving Event Detection in Videos with Robust Temporal Normalization and Adaptive Thresholding

Incorporating Domain Knowledge for Efficient Object Detection
Object detection is a fundamental task in computer vision, and efficient algorithms are crucial for real-world applications. Recently, there has been a growing interest in using domain knowledge to improve object detection performance. This article presents a novel approach that leverages domain knowledge to omit intervals shorter than 2 seconds, which account for only 2.8% of the annotations, reducing the processing time by half.

The authors propose a three-stage pipeline

  1. Proposal Generation: In this stage, the input event representations (histograms or time maps) are resized to 224×224 pixels and passed through a CNN to generate proposals. The authors use the robust minimum and maximum, defined by the p-th and (100-p)-th percentile, and clip r(t) outside these values.
  2. Interval Merging: In this stage, the intervals of high event rate (i.e., activity) are used as proposals using the watershed algorithm. The regions >λ are used as proposals, and they are merged iteratively based on a merging threshold µ.
  3. Proposal Prediction: In this stage, the proposals are augmented by adding "start" and "end" stages, and the ATSN (Augmentation-based Temporal Segmentation Network) works on the proposals to predict the proposal prediction.
    The authors evaluate their approach using independent experiments for the proposal stage and the whole two-stage pipeline, and report the average recall (AR) for tIoU = {0.1, 0.3, 0.5, 0.7} using the best Np proposals per recording and nest, with Np ∈ {20, 30, 50}. They also report temporal mean average precision (mAP) at the same tIoU values as for the AR metric.
    The results show that their approach outperforms the state-of-the-art method [47] in terms of both recall and processing time. The authors demonstrate that incorporating domain knowledge can significantly improve object detection performance while reducing the computational complexity.
    In conclusion, this article presents a novel approach for efficient object detection by leveraging domain knowledge to omit irrelevant intervals shorter than 2 seconds. By using a three-stage pipeline and augmenting proposals with "start" and "end" stages, the authors achieve better performance while reducing the processing time. This work has important implications for real-world applications where efficiency is crucial, such as autonomous driving, surveillance, and robotics.