Please wait a minute...
Computational Visual Media  2019, Vol. 05 Issue (04): 375-390    doi: 10.1007/s41095-019-0157-9
Research Article     
InSocialNet: Interactive visual analytics for role-event videos
Yaohua Pan1, Zhibin Niu1,(✉), Jing Wu2, Jiawan Zhang1
1College of Intelligence and Computing, and School of New Media and Communication, Tianjin University, Tianjin, 300354, China. E-mail: Y. Pan, lypyh522@126.com; J. Zhang, jwzhang@tju.edu.cn.;
2School of Computer Science & Informatics, Cardiff University, CF243AA, UK. E-mail: J.Wu@cs.cardiff.ac.uk.
Download: PDF (14457 KB)      HTML  
Export: BibTeX | EndNote (RIS)      

Abstract  

Role-event videos are rich in information but challenging to be understood at the story level. The social roles and behavior patterns of characters largely depend on the interactions among characters and the background events. Understanding them requires analysisof the video contents for a long duration, which is beyond the ability of current algorithms designed for analyzing short-time dynamics. In this paper, we propose InSocialNet, an interactive video analytics tool for analyzing the contents of role-event videos. It automatically and dynamically constructs social networks from role-event videos making use of face and expression recognition, and provides a visual interface for interactive analysis of video contents. Together with social network analysis at the back end, InSocialNet supports users to investigate characters, their relationships, social roles, factions, and events in the input video. We conduct case studies to demonstrate the effectiveness of InSocialNet in assisting the harvest of rich information from role-event videos. We believe the current prototype implementation can be extended to applications beyond movie analysis, e.g., social psychology experiments to help understand crowd social behaviors.



Key wordsvisual analytics      behavioral psychology      role-event videos      social network      video analysis     
Received: 08 December 2019      Published: 13 March 2020
Corresponding Authors: Zhibin Niu   
About author:

*Ruotong Li and Weixin Si contributed equally to this work.

Cite this article:

Yaohua Pan, Zhibin Niu, Jing Wu, Jiawan Zhang. InSocialNet: Interactive visual analytics for role-event videos. Computational Visual Media, 2019, 05(04): 375-390.

URL:

http://cvm.tsinghuajournals.com/10.1007/s41095-019-0157-9     OR     http://cvm.tsinghuajournals.com/Y2019/V05/I04/375

Fig. 1:  Our InSocialNet system architecture. The back end detects individual characters, builds and updates a social network along with the progress of the video. The front end visual interactive interface enables users to explore the hidden information of the characters, among them and during events.
Fig. 2:  InSocialNet system. The design follows Shneiderman’s mantra. View A gives an overview of the video and the co-occurrence network constructed. The tool enables users to backtrack how the network is constructed by controlling the synced time slider. View B enables users to rank the characters by various measures. View C is the main inspection explorer. It employs three coordinated views to support the inspection and analysis of key roles and relationships, events and factions, and changes of emotions.
Fig. 3:  Faction detection by community detection algorithms and our method for the analysis of "Romance of the Three Kingdoms" .
Fig. 4:  Faction detection by community detection algorithms and our method for the analysis of "Harry Potter" .
Fig. 5:  The constructed social network of "Harry Potter" . It shows that voldemort and Peter are only connected with each other, but not to other characters.
Fig. 6:  (a) Cao Cao’s co-occurrence social network. Cao Cao is the most authoritative person, indicating that the number of roles associated with him is the largest. (b) Cao Ren’s co-occurrence social network is largely a subset of Cao Cao’s, which shows that his position in the faction is lower than Cao Cao’s. (c) Some characters in Cao Cao’s faction are only related to the important roles such as Cao Cao and Cao Ren.
Fig. 7:  (a) The Character-Event Inspection view (authority score vs. first appearance time) shows there are two major events: at the beginning and in the middle of the video. Further exploration shows that the first wave is the allied forces of princes planning to attack Dong Zhuo, the ghostly prime minister, while the second wave is about the resistance from Dong Zhuo. (b) The Character-Event Inspection view (authority score vs. length of appearance time) shows Cao Cao with his brotherhood are the main roles, and unsurprisingly, their generals and soldiers also have high frequency appearance.
Fig. 8:  The back end machine learning algorithms automatically cluster the characters into separated groups. The results of faction analysis show the emerging and potential factions in the video, where the closer the coordinates are, the more likely the characters belong to the same faction. The manually annotated results validate the correctness.
Fig. 9:  Liu Bei, Guan Yu, and Zhang Fei are closely connected to each other, but not with others. So although their first appearances are in the same time period with other characters in the faction of Cao Cao, their authority scores are low.
Fig. 10: Emotion Inspection view gives an overview of the atmosphere and also supports the visualization of "the shape of the personalities" of different characters. It can be seen that there are more negative emotions in the first half of the video, and more positive emotions in the second half.
Fig. 11:  Comparing the emotion changes of Guan Yu and Zhang Fei (zoom in for better view). Guan Yu is mostly calm. Even when he is sad, he shows few extreme emotions. On the other hand, his brotherhood, Zhang Fei, with a sharply contrasting emotion shape means that he is a person with distinct love and hate.
[1]   Khorrami, P.; Paine, T. L.; Brady, K.; Dagli, C.; Huang, T. S.How deep neural networks can improve emotion recognition on video data. In: Proceedings of the IEEE International Conference on Image Processing, 619-623, 2016.
[2]   Kim, M.; Kumar, S.; Pavlovic, V.; Rowley, H.Face tracking and recognition with visual constraints in real-world videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1-8, 2008.
[3]   Forczmański, P.; Nowosielski, A.Multi-view data aggregation for behaviour analysis in video surveillance systems. In: Computer Vision and Graphics. Lecture Notes in Computer Science, Vol. 9972. Chmielewski, L.; Datta, A.; Kozera, R.; Wojciechowski, K. Eds. Springer Cham, 462-473, 2016.
[4]   Kagan, D.; Chesney, T.; Fire, M. Using data science to understand the film industry’s gender gap. arXiv preprint arXiv:1903.06469, 2019.
[5]   Lv, J.; Wu, B.; Zhou, L. L.; Wang, H. StoryRoleNet: Social network construction of role relationship in video. IEEE Access Vol. 6, 25958-25969, 2018.
[6]   Yu, C.; Zhong, Y. W.; Smith, T.; Park, I.; Huang, W. X. Visual data mining of multimedia data for social and behavioral studies. Information Visualization Vol. 8, No. 1, 56-70, 2009.
[7]   Tomasi, M.; Pundlik, S.; Bowers, A. R.; Peli, E.; Luo, G. Mobile gaze tracking system for outdoor walking behavioral studies. Journal of Vision Vol. 16, No. 3, 27, 2016.
[8]   Bernstein, G. A.; Hadjiyanni, T.; Cullen, K. R.; Robinson, J. W.; Harris, E. C.; Young, A. D.; Fasching, J.; Walczak, N.; Lee, S.; Morellas, V.; Papanikolopoulos, N. Use of computer vision tools to identify behavioral markers of pediatric Obsessive-Compulsive disorder: A pilot study. Journal of Child and Adolescent Psychopharmacology Vol. 27, No. 2, 140-147, 2017.
[9]   Grover, A.; Leskovec, J.node2vec: Scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 855-864, 2016.
[10]   Jiang, Y. G.; Dai, Q.; Xue, X. Y.; Liu, W.; Ngo, C. W.Trajectory-based modeling of human actions with motion reference points. In: Computer Vision - ECCV 2012. Lecture Notes in Computer Science, Vol. 7576. Fitzgibbon, A.; Lazebnik, S.; Perona, P.; Sato, Y.; Schmid, C. Eds. Springer Berlin Heidelberg, 425-438, 2012.
[11]   Ren, W. H.; Kang, D.; Tang, Y. D.; Chan, A. B.Fusing crowd density maps and visual object trackers for people tracking in crowd scenes. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5353-5362, 2018.
[12]   Renoust, B.; Ngo, T. D.; Le, D. D.; Satoh, S.A social network analysis of face tracking in news video. In: Proceedings of the 11th International Conference on Signal-Image Technology & Internet-Based Systems, 474-481, 2015.
[13]   Schmitt, D. T.; Kurkowski, S. H.; Mendenhall, M. J.Building social networks in persistent video surveillance. In: Proceedings of the IEEE International Conference on Intelligence and Security Informatics, 217-219, 2009.
[14]   Taha, K. Disjoint community detection in networks based on the relative association of members. IEEE Transactions on Computational Social Systems Vol. 5, No. 2, 493-507, 2018.
[15]   Newman, M. E. J.; Girvan, M. Finding and evaluating community structure in networks. Physical Review E Vol. 69, No. 2, 026113, 2004.
[16]   Newman, M. E. J. Finding community structure in networks using the eigenvectors of matrices. Physical Review E Vol. 74, No. 3, 036104, 2006.
[17]   Pons, P.; Latapy, M. Computing communities in large networks using random walks. Journal of Graph Algorithms and Applications Vol. 10, No. 2, 191-218, 2006.
[18]   Raghavan, U. N.; Albert, R.; Kumara, S. Near linear time algorithm to detect community structures in large-scale networks. Physical Review E Vol. 76, No. 3, 036106, 2007.
[19]   Weng, C. Y.; Chu, W. T.; Wu, J. L. RoleNet: Movie analysis from the perspective of social networks. IEEE Transactions on Multimedia Vol. 11, No. 2, 256-271, 2009.
[20]   Ramanathan, V.; Yao, B. P.; Li, F. F.Social role discovery in human events. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2475-2482, 2013.
[21]   Sun, Q. R.; Schiele, B.; Fritz, M.A domain based approach to social relation recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 435-444, 2017.
[22]   Van der Maaten, L. Learning a parametric embedding by preserving local structure. In: Proceedings of the 12th International Conference on Artificial Intelligence and Statistics, 384-391, 2009.
[23]   Avril, M.; Leclère, C.; Viaux, S.; Michelet, S.; Achard, C.; Missonnier, S.; Keren, M.; Cohen, D.; Chetouani, M. Social signal processing for studying parent-infant interaction. Frontiers in Psychology Vol. 5, 1437, 2014.
[24]   Park, H. S.; Jain, E.; Sheikh, Y.Predicting primary gaze behavior using social saliency fields. In: Proceedings of the IEEE International Conference on Computer Vision, 3503-3510, 2013.
[25]   Vrigkas, M.; Nikou, C.; Kakadiaris, I. A. Identifying human behaviors using synchronized audio-visual cues. IEEE Transactions on Affective Computing Vol. 8, No. 1, 54-66, 2017.
[26]   Jack, R. E.; Garrod, O. G. B.; Yu, H.; Caldara, R.; Schyns, P. G. Facial expressions of emotion are not culturally universal. Proceedings of the National Academy of Sciences Vol. 109, No. 19, 7241-7244, 2012.
[27]   Seng, K. P.; Ang, L. M. Video analytics for customer emotion and satisfaction at contact centers. IEEE Transactions on Human-Machine Systems Vol. 48, No. 3, 266-278, 2018.
[28]   Wang, J.; Yuan, Y.; Yu, G. Face attention network: An effective face detector for the occluded faces. arXiv preprint arXiv:1711.07246, 2017.
[29]   Zhou, E.; Cao, Z.; Yin, Q. Naive-deep face recognition: Touching the limit of LFW benchmark or not? arXiv preprint arXiv:1501.04690, 2015.
[30]   Fruchterman, T. M. J.; Reingold, E. M. Graph drawing by force-directed placement. Software: Practice and Experience Vol. 21, No. 11, 1129-1164, 1991.
[31]   Chikhaoui, B.; Chiazzaro, M.; Wang, S. R.; Sotir, M. Detecting communities of authority and analyzing their influence in dynamic social networks. ACM Transactions on Intelligent Systems and Technology Vol. 8, No. 6, Article No. 82, 2017.
[32]   1Grover, A.; Leskovec, J.node2vec: Scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 855-864, 2016.
[33]   Ma, X. K.; Dong, D. Evolutionary nonnegative matrix factorization algorithms for community detection in dynamic networks. IEEE Transactions on Knowledge and Data Engineering Vol. 29, No. 5, 1045-1058, 2017.
[34]   Lu, Z. Q.; Sun, X.; Wen, Y. G.; Cao, G. H.; Porta, T. L. Algorithms and applications for community detection in weighted networks. IEEE Transactions on Parallel and Distributed Systems Vol. 26, No. 11, 2916-2926, 2015.
[35]   Rosvall, M.; Bergstrom, C. T.Maps of information flow reveal community structure in complex networks. In: Proceedings of the National Academy of Sciences USA, 1118-1123, 2007.
[36]   Blondel, V. D.; Guillaume, J. L.; Lambiotte, R.; Lefebvre, E. Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment Vol. 2008, No. 10, P10008, 2008.
[37]   Xiao, Y. P.; Li, X. X.; Wang, H. H.; Xu, M.; Liu, Y. B. 3-HBP: A three-level hidden Bayesian link prediction model in social networks. IEEE Transactions on Computational Social Systems Vol. 5, No. 2, 430-443, 2018.
[1] Rachele Bellini, Yanir Kleiman, Daniel Cohen-Or. Dance to the beat: Synchronizing motion to audio[J]. Computational Visual Media, 2018, 04(03): 197-208.