Aiming at bridging the gap between human and machine vision, video signal is probably the main cue that multimodal machine learning and pattern recognition tools rely on. Among the difficult tasks dealt with in this domain, human behavior understanding is surely accounted a challenging one. Activities such as action recognition, affective computing, human computer interaction, urban analytics, or applications in health, security, robotics and video games are undoubtedly well-established topics in which computer vision plays a fundamental role. Studying human-human or human-computer interactions has also attracted increasing attention in recent years due to its widespread meaning and applications. Furthermore, the advancements in computer vision for social behavior analysis can bring ubiquitous changes in the society. However, despite significant research progress, the automated understanding of a wide range of human activities from visual as well as multimodal data still remains a source of challenging topics. |