In this article, the author explores how our understanding of the world is deeply rooted in our bodily experiences. The author argues that humans don’t just process information at a pixel level but instead use attention and expectations to interpret objects at an object level. This high-level structured information, or "knowledge," is often missing when deep learning models are trained on raw data.
To address this issue, the article proposes a semantic representation of the driving scene that can be exploited by deep learning-based approaches. By providing a clear and concise explanation of complex concepts using everyday language and engaging metaphors or analogies, the author demystifies the topic, making it easier for readers to comprehend.
In summary, the article highlights the significance of understanding how our bodily experiences shape our perception and reasoning abilities. By leveraging this knowledge, deep learning models can be improved to better handle tasks such as trajectory prediction. The author provides a novel approach to addressing this challenge by incorporating semantic representations into the training process.
Computer Science, Computer Vision and Pattern Recognition