Hideo Saito received the Ph.D. degree in electrical engineering from Keio University, Japan, in 1992. Since 1992, he has been on the Faculty of Science and Technology, Keio University. From 1997 to 1999, he joined the Virtualized Reality Project at the Robotics Institute, Carnegie Mellon University, as a Visiting Researcher. Since 2006, he has been a Full Professor with the Department of Information and Computer Science, Keio University. His research interests include computer vision and pattern recognition, and their applications to augmented reality, virtual reality, and human-robotic interaction. His recent activities in academic conferences include being the Program Chair of ACCV 2014, the General Chair of ISMAR 2015, the Program Chair of ISMAR 2016, and the Scientific Program Chair of Euro VR2020.

Hideo Saito, Ph.D.
Dept. of Information and Computer Science, Keio University
3-14-1 Hiyoshi Kohoku-ku Yokohama, 223-8522, Japan

Tel: +81-45-566-1753, Fax: +81-45-566-1747

No photo available
TITLE: AI Applications for Image Sensing

Ten years have passed since AI was shown to be capable of dramatically improving image recognition performance. In the past decade, various studies have shown that AI can innovatively improve the conventional performance not only in image recognition, but also in "image sensing," which estimates various physical quantities and geometric shapes and structures of objects in the real environment captured in images. In this talk, the methods and performance of machine learning techniques for image sensing will be introduced through concrete examples of their application.


Lin Weisi is an active researcher in intelligent image processing, perception-based signal modelling and assessment, video compression, and multimedia communication. He had been the Lab Head, Visual Processing, in Institute for Infocomm Research (I2R). He is a Professor in School of Computer Science and Engineering, Nanyang Technological University, where he also served as the Associate Chair (Research). He is a Fellow of IEEE and IET, and an Honorary Fellow, Singapore Institute of Engineering Technologists. He has been awarded Highly Cited Researcher 2019 and 2020 and 2021 by Web of Science. He has elected as a Distinguished Lecturer in both IEEE Circuits and Systems Society (2016-17) and Asia-Pacific Signal and Information Processing Association (2012-13), and given keynote/invited/tutorial/panel talks to 40+ international conferences. He has been an Associate Editor for IEEE Trans. Image Process., IEEE Trans. Circuits Syst. Video Technol., IEEE Trans. Multimedia, IEEE Signal Process. Lett., Quality and User Experience, and J. Visual Commun. Image Represent. He also chaired the IEEE MMTC QoE Interest Group (2012-2014), and elected to the European Network on Quality of Experience in Multimedia Systems and Services (QUALINET) from a Non-COST Country Institution, based on scientific merits (2011); he has been a TP Chair for IEEE 2013, QoMEX 2014, PV 2015, PCM 2012 and IEEE VCIP 2017. He believes that good theory is practical, and has delivered 10+ major systems and modules for industrial deployment with the technology developed.

Weisi Lin
Nanyang Technological University, Singapore

No photo available
TITLE: Determine Visual Just-Noticeable-Difference for Human & Machine Tasks

Visual Just-Noticeable-Difference (JND) refers to the minimum visual signal change which can be sensed by the human being. It is a result of human evolution: our early ancestors optimized the visual sensing organs (mainly eyes and brain) to achieve just the sufficient sensitivity and level of information detail toward fruit and animal hunting, as the solution to tackle big visual data received under the body resource constraint. The JND formulation and computational models are the prerequisite for user-centric designs for turning human perceptual limitation into meaningful system advantages, in terms of computing power, bandwidth, storage space, energy/battery usage, device cost/size, and so on. In this talk, systematic views and a classification will be first presented on visual JND research. Then, the related computational models and applications up to date are to be reviewed, from conventionally handcrafted approaches to recently emerging data-driven ones. Furthermore, recent research attempts will be introduced regarding perception by machines, which increasingly become ultimate users for visual signals in the AI era; extension is also to be explored for audio, smell, taste, and haptics/temperature signals, as well as cross-modality efforts, toward full multimedia. Particularly, digital representation of the related signals (like smell and taste) is still a research problem in general. Finally, possible future opportunities are to be highlighted and discussed.


Matteo Naccari is the Lead Video Codec Engineer at Audinate in Cambridge – UK. He earned a PhD in Information Science Engineering from the Technical University of Milan (Politecnico di Milano) under the supervision of Professor Stefano Tubaro with a scholarship funded by STMicroelectronics to research on innovative video coding tools for transcoding, error resilience and automatic video quality monitoring. He has been a Post-Doctoral researcher at the Telecommunications Institute – Lisbon researching on methods to integrate models of the human visual system into video codec architectures. He then moved to cover positions in the industry, first as Senior Technologist in the R&D division of the British Broadcasting Corporation (BBC), then as Principal Engineer at DisplayLink (currently Synaptics) and finally Audinate since 2021. In 2017 he was an invited scientist at the Science and Technology Research Laboratory (STRL) of the public Japanese broadcaster (NHK) in Tokyo to research on transmission of High Dynamic Range (HDR) material in broadcasting application as well as design novel coding tools for the successor of the H.265/HEVC standard. His current research interests are on video coding technologies and optimised implementations of codecs to serve professional Audio and Video transmitted over IP networks. He’s a Senior member of the IEEE and author or co-author of several conference and journal contributions.

Matteo Naccari
Lead Video Codec Engineer
Audinate – Cambridge UK

No photo available
TITLE: Video delivery over IP for professional applications: current status and challenges

Ethernet technologies have evolved over the past few years to guarantee reliable data transmission and synchronization. This has enabled the migration of content production from bespoke and dedicated networks to IP ones. Such a migration has opened the door to a variety of new opportunities whereby Audio and Video (AV) devices can be easily discovered and connected via simple ethernet switches. Studio production can be delocalized, and events may happen remotely (e.g. musicians playing at different locations) to accommodate mobility restrictions imposed by the pandemics such as the recent COVID-19 one. Moreover, thanks to the use of IP networks, different stages of the content production and delivery pipeline may happen in the cloud where computational resources are more powerful. In this context, the delivery of precisely timed audio-visual signals can be accomplished by systems which implement a software and hardware platform solution enabling such a reliable transport, guaranteeing interoperability and easing the configuration of the network. One of most popular examples of these network systems for AV data transmission over IP is Audinate’s Dante. Dante originally dealt with audio data and has become the de-facto industry standard. Due to the recent trends, where video signals are becoming more pervasive over IP networks in professional content creation and delivery, Dante networks are now evolving to carry video material and address all the related challenges. The main goal of this keynote is to introduce the audience to the main application scenarios and concepts behind the delivery of professional AV material over IP networks. Particular emphasis will be devoted to the processing and transmission of visual data, which, due to their nature, will inevitably take the largest share of the total available bandwidth. Moreover, given the precisely timed video data delivery coupled with good visual quality requirements lead final users of professional AV applications to have high expectations which require careful and effective design of compression formats and related codecs. The talk will discuss the challenges faced in the AV over IP scenario and present the typical operating points such as video formats, compression standards and bandwidth required. Also, some remarks on the analysis of data travelling through networks such as Dante will be commented on.