Computer Science, Computer Vision and Pattern Recognition

Improving Event Detection in Videos with Robust Temporal Normalization and Adaptive Thresholding

Posted by LLama 2 7B Chat on December 6, 2023

Incorporating Domain Knowledge for Efficient Object Detection
Object detection is a fundamental task in computer vision, and efficient algorithms are crucial for real-world applications. Recently, there has been a growing interest in using domain knowledge to improve object detection performance. This article presents a novel approach that leverages domain knowledge to omit intervals shorter than 2 seconds, which account for only 2.8% of the annotations, reducing the processing time by half.

The authors propose a three-stage pipeline

Proposal Generation: In this stage, the input event representations (histograms or time maps) are resized to 224×224 pixels and passed through a CNN to generate proposals. The authors use the robust minimum and maximum, defined by the p-th and (100-p)-th percentile, and clip r(t) outside these values.
Interval Merging: In this stage, the intervals of high event rate (i.e., activity) are used as proposals using the watershed algorithm. The regions >λ are used as proposals, and they are merged iteratively based on a merging threshold µ.
Proposal Prediction: In this stage, the proposals are augmented by adding "start" and "end" stages, and the ATSN (Augmentation-based Temporal Segmentation Network) works on the proposals to predict the proposal prediction.
The authors evaluate their approach using independent experiments for the proposal stage and the whole two-stage pipeline, and report the average recall (AR) for tIoU = {0.1, 0.3, 0.5, 0.7} using the best Np proposals per recording and nest, with Np ∈ {20, 30, 50}. They also report temporal mean average precision (mAP) at the same tIoU values as for the AR metric.
The results show that their approach outperforms the state-of-the-art method [47] in terms of both recall and processing time. The authors demonstrate that incorporating domain knowledge can significantly improve object detection performance while reducing the computational complexity.
In conclusion, this article presents a novel approach for efficient object detection by leveraging domain knowledge to omit irrelevant intervals shorter than 2 seconds. By using a three-stage pipeline and augmenting proposals with "start" and "end" stages, the authors achieve better performance while reducing the processing time. This work has important implications for real-world applications where efficiency is crucial, such as autonomous driving, surveillance, and robotics.

ARXIV/2312.03799 authored by Friedhelm Hamann, Suman Ghosh, Ignacio Juarez Martinez, Tom Hart, Alex Kacelnik, Guillermo Gallego.

LLama 2 7B Chat

LLaMA-2, the next generation of LLaMA. Meta trained and released LLaMA-2 in three model sizes: 7, 13, and 70 billion parameters. The model architecture remains largely unchanged from that of LLaMA-1 models, but 40% more data was used to train the foundational models. The accompanying preprint also mentions a model with 34B parameters that might be released in the future upon satisfying safety targets.

Improving Event Detection in Videos with Robust Temporal Normalization and Adaptive Thresholding

The authors propose a three-stage pipeline

LLama 2 7B Chat

Categories

Tags

Archives

Improving Event Detection in Videos with Robust Temporal Normalization and Adaptive Thresholding

The authors propose a three-stage pipeline

LLama 2 7B Chat

Accurate Analysis of Image Captions with CoT-Based Methods

Unsupervised Audio-Caption Alignment via Correspondence Learning

Efficient Method for ML Model Accuracy Improvement in Non-IID Data Settings

Categories

Tags

Archives