ARTEMIS 2012 Workshop - 13 October 2012

 

 

 

in conjunction with European Conference on Computer Vision (ECCV) - 13 October 2012, Firenze, Italy.


Download workshop brochure

Call for Papers

Recently, it can be argued that the intelligence behind many pattern recognition and computer vision systems is mainly focused on two main approaches; (i) extraction of smart features able to efficiently represent the rich visual content and (ii) adoption of non-linear and adaptable (semi-supervised) learning strategies able to fill the gap between the extracted low level features and the high level concepts, humans use to perceive the content.  The feature extraction is a data dimensionality reduction strategy that addresses the difficulty that learning complexity grows exponentially upon a linear increase in the dimensionality of data. It is also clear that extraction of representational features is a challenging and application-dependent process. Non-representative features significantly affect the recognition accuracy, especially for complex and dynamic environments even though they are processed by highly non-linear feature transformation models.

Emulating the efficiency and robustness by which the human brain represents information has been a core challenge in machine learning research. The human brain does not work by explicitly pre-processing sensory signals but rather allows them to propagate into complex hierarchies. Then, as time elapses, we learn to represent these observations using (structured or not) regularities. This implies that the human information processing mechanisms suggest “deep architectures” for learning, i.e., hierarchical, multi-layer models. This discovery motivated the emergence of the subfield of deep machine learning, which focuses on computational models for information representation that exhibit similar characteristics to that of the humans.

Such contemporary machine learning applications are important for cognitive video supervision and event analysis in video sequences, that are critical tasks in many multimedia applications. Methods, tools and algorithms that aim to detect and recognize high level concepts and their respective spatio-temporal and causal relations in order to identify semantic video activities, actions and procedures, have been in the focus of the research community over the last years.

This research area has strong impact on many real-life multimedia applications based on a semantic characterization and annotation of video streams in various domains (e.g., sports, news, documentaries, movies and surveillance), either broadcast or user-generated videos. Although a first critical issue is the estimation of quantitative parameters describing where events are detected, recent trends are facing the analysis of multimedia footage by applying image and video understanding techniques to that detected/tracked motion. That is, the challenge is becoming the generation of qualitative descriptions about the meaning of motion, therefore describing not only where, but also why an event is being observed.

The goal of this workshop is to seek for innovative contribution in the above fields bringing together researchers from machine learning, image processing and computer vision. The new research achievements should be demonstrated on real-world and complex application scenarios promoting the current research achievements. Potential topics include, but are not limited to:

  • Advanced machine learning strategies in computer vision,
  • Transfer, learning, deep learning, active learning, on-line learning
  • Methods for robust detection of semantic concepts in video streams;
  • Object/human detection and tracking using advanced machine learning tools
  • Annotation of events and human motion and activity in large-scale multimedia content
  • Identification of spatio-temporal, causal and contextual relations of events
  • Semantic and event-based summarization, matching and retrieval of monitored video footage
  • Enhancement of events analysis based on attention models or multiscale/multisource data fusions
  • Event- and context-oriented relevance feedback algorithms
  • Strategies for context learning (background scene and its regions, objects and agents)
  • Research projects in the respective fields (international standardization activities, national/international research projects).     
  • Real-life applications, like industrial, traffic analysis, critical infrastructures, athletic events, etc