IMX ’23: Proceedings of the 2023 ACM International Conference on Interactive Media Experiences
Full Citation in the ACM Digital LibrarySESSION: Human Behavior and Embodied Experiences
Detecting Human Attitudes through Interactions with Responsive Environments
The paper is based on the developments of the research project “Paradigms of Ubiquitous Computing”, funded by the Swiss National Science Foundation (SNSF, 100016_185436 / 1), 2019-23. It investigates the impact of environmentally embedded sensor-actuator systems on humans. With a critical stance, we examine human-machine interfaces to make quantitative statements about behavior patterns and attitudes, only based on their physical interactions with a responsive environment. By staging different paradigms of Ubiquitous Computing in an experimental setup and evaluating them with test persons, we aim to gain insights into the human experience and appropriation of immersive and sometimes challenging situations. The artistic approach is based on strategies of New Media Art and Speculative Design and is not aligned with processes commonly used in applied research and development. The evaluation design is based on mixed methods with a strong emphasis on semantic differentials to quantify user interactions with electronically enhanced devices and furnishings. The focus is on interaction design strategies and evaluation design methods.
An Integrated Framework for Understanding Multimodal Embodied Experiences in Interactive Virtual Reality
Virtual Reality (VR) technology enables “embodied interactions” in realistic environments where users can freely move and interact, with deep physical and emotional states. However, a comprehensive understanding of the embodied user experience is currently limited by the extent to which one can make relevant observations, and the accuracy at which observations can be interpreted.
Paul Dourish proposed a way forward through the characterisation of embodied interactions in three senses: ontology, intersubjectivity, and intentionality. In a joint effort between computer and neuro-scientists, we built a framework to design studies that investigate multimodal embodied experiences in VR, and apply it to study the impact of simulated low-vision on user navigation. Our methodology involves the design of 3D scenarios annotated with an ontology, modelling intersubjective tasks, and correlating multimodal metrics such as gaze and physiology to derive intentions. We show how this framework enables a more fine-grained understanding of embodied interactions in behavioural research.
Virtual Rehearsal Suite: An Environment and Framework for Virtual Performance Practice
Contemporary performance artists use Virtual Reality (VR) tools to create immersive narratives and extend the boundaries of traditional performance mediums. As the medium evolves, performance practice is changing with it. Our work explores ways to leverage VR to support the creative process by introducing the Virtual Rehearsal Suite (VRS) that provides users with the experience of a large-scale rehearsal or performance environment while occupying limited physical space and minimal real-world obstructions. In this paper, we discuss findings from scene study experiments conducted within the VRS. In addition, we contribute our thresholding protocols a framework designed to support user transitions into and out of VR experiences. Our integrated approach to digital performance practice and creative collaboration combines traditional and contemporary acting techniques with HCI research to harness the innovative capabilities of virtual reality technologies creating accessible, immersive experiences for actors while facilitating user presence through state change protocols.
What’s my future: a Multisensory and Multimodal Digital Human Agent Interactive Experience
This paper describes an interactive multimodal and multisensory fortune-telling experience for digital signage applications that combines digital human agents along with touchless haptic technology and gesture recognition. For the first time, human-to-digital human interaction is mediated through hand gesture input and mid-air haptic feedback, motivating further research into multimodal and multisensory location-based experiences using these and related technologies. We take a phenomenological approach and present our design process, the system architecture, and discuss our gained insights, along with some of the challenges and opportunities we have encountered during this exercise. Finally, we use our singular implementation as a paradigm as a proxy for discussing complex aspects such as privacy, consent, gender neutrality, and the use of digital non-fungible tokens at the phygital border of the metaverse.
SESSION: Audiovisual Content Production & Referencing
Supporting Video Authoring for Communication of Research Results
Video summaries of scientific publications have gained more and more popularity over the last years, requiring many researchers to familiarize themselves with the tools and techniques of video production which can be an overwhelming task. This paper introduces a video structuring framework embedded into the authoring tool Pub2Vid. The tool supports users with the creation of their video outline and script, providing real video examples and recommendations based on the analysis of 40 publication summarization videos which were rated in a user study with 68 participants. Following a four-tier evaluation methodology, the application’s usability is assessed and improved via amateur and expert interviews, two rounds of usability tests and two case studies. It is shown that the tool and its recommendations are particularly useful for beginners due to the simple design and intuitive components as well as suggestions based on real video examples.
Referencing in YouTube Knowledge Communication Videos
In recent years, there has been widespread concern about misinformation and hateful content on social media that are damaging societies. Being one of the most influential social media that practically serves as a new search engine, YouTube has accepted criticisms of being a major conduit of misinformation. However, it is often neglected that there exist communities on YouTube that aim to produce credible and informative content – usually falling under the educational category. One way to characterize this valuable content is to find references entailed to each video. While such citation practices function as a voluntary gatekeeping culture within the community, how they are actually done varies and remains unquestioned. Our study aims to investigate common citation practices in major knowledge communication channels on YouTube. After investigating 44 videos manually sampled from YouTube, we characterized two common referencing methods, namely bibliographies and in-video citations. We then selected 129 referenced resources, assessed and categorized their availability as being immediate, conditional, and absent. After relating the observed referencing methods to the characteristics of the knowledge communication community, we show that the usability of references could vary depending on viewers’ user profiles. Furthermore, we witnessed the use of rich-text technologies that can enrich the usability of online video resources. Finally, we discuss design implications for the platform to have a standardized referencing convention that can promote information credibility and improve user experience, especially valuable for the young audiences who tend to watch this content.
Producing Personalised Object-Based Audio-Visual Experiences: an Ethnographic Study
Developments in object-based media and IP-based delivery offer an opportunity to create superior audience experiences through personalisation. Towards the aim of making personalised experiences regularly available across the breadth of audio-visual media, we conducted a study to understand how personalised experiences are being created. This consisted of interviews with producers of six representative case studies, followed by a thematic analysis. We describe the workflows and report on the producers’ experiences and obstacles faced. We found that the metadata models, enabling personalisation, were developed independently for each experience, restricting interoperability of personalisation affordances provided to users. Furthermore, the available tools were not effectively integrated into preferred workflows, substantially increasing role responsibilities and production time. To ameliorate these issues, we propose the development of a unifying metadata framework and novel production tools. These tools should be integrated into existing workflows; improve efficiency using AI; and enable producers to serve more diverse audiences.
LIFT – A System to Create Mixed 360° Video and 3D Content for Live Immersive Virtual Field Trip
Our paper presents LIFT, a system that enables educators to create immersive virtual field trip experiences for their students. LIFT overcomes the challenges of enabling non-technical educators to create their own content and allows educators to act as guides during the immersive experience. The system combines live-streamed 360° video, 3D models, and live instruction to create collaborative virtual field trips. To evaluate LIFT, we developed a field trip with biology educators from the University of Central Florida(UCF) and showcased it at a science festival. Our results suggest that LIFT can help educators create immersive educational content while out in the field. However, our pilot observational study at the museum highlighted the need for further research to explore the instructional design of mixed immersive content created with LIFT. Overall, our work provides an application development framework for educators to create immersive, hands-on field trip experiences.
SESSION: Audiovisual Content Quality & Accessibility
Accessibility Research in Digital Audiovisual Media: What Has Been Achieved and What Should Be Done Next?
The consumption of digital audiovisual media is a mainstay of many people’s lives. However, people with accessibility needs often have issues accessing this content. With a view to addressing this inequality, there exists a wide range of interventions that researchers have explored to bridge this accessibility gap. Despite this work, our understanding of the capability of these interventions is poor. In this paper, we address this through a systematic review of the literature, creating a dataset of and analysing N = 181 scientific papers. We have found that certain areas have accrued a disproportionate amount of attention from the research community – for example, blind and visually impaired and d/Deaf and hard of hearing people account for of papers (N = 170). We describe challenges researchers have addressed, end-user communities of focus, and interventions examined. We conclude by evaluating gaps in the literature and areas that could use more focus on in the future.
Influence of Multi-Modal Interactive Formats on Subjective Audio Quality and Exploration Behavior
This study uses a mixed between- and within-subjects test design to evaluate the influence of interactive formats on the quality of binaurally rendered 360° spatial audio content. Focusing on ecological validity using real-world recordings of 60 s duration, three independent groups of subjects () were exposed to three formats: audio only (A), audio with 2D visuals (A2DV), and audio with head-mounted display (AHMD) visuals. Within each interactive format, two sessions were conducted to evaluate degraded audio conditions: bit-rate and Ambisonics order. Our results show a statistically significant effect (p < .05) of format only on spatial audio quality ratings for Ambisonics order. Exploration data analysis shows that format A yields little variability in exploration, while formats A2DV and AHMD yield broader viewing distribution of 360° content. The results imply audio quality factors can be optimized depending on the interactive format.
Immersion or Disruption?: Readers’ Evaluation of and Requirements for (3D-)audio as a Tool to Support Immersion in Digital Reading Practices.
In this paper, we aim to contribute to the understanding of how readers experience immersion in digital reading experiences, more specifically with digital reading supported by (3D-)audio tracks. We formulate user and content requirements for implementing (3D-)audio soundtracks for readers in a digital reading application. The main research question addressed in this paper is: (how) can audio aid the immersion of readers in digital fiction stories? To answer this question, three online focus group discussions were organised in Belgium and Germany. As part of the set-up of the Horizon Europe project Möbius, 18 participants tested different 3D-audio tracks while reading via the Thorium Reader application. The results first address how participants define immersion, and how the role of audio in immersion can become paradoxical. Then, the paper presents a detailed evaluation of the factors en- or disabling immersion for the specific 3D-audio tracks, and how these insights can be implemented in reading apps via user and content requirements.
SESSION: Affective & Immersive User Experiences Assessment
More Immersed but Less Present: Unpacking Factors of Presence Across Devices
The production of immersive media often involves 360-degree viewing on mobile or immersive VR devices, particularly in the field of immersive journalism. However, it is unclear how the different technologies used to present such media affect the experience of presence. To investigate this, a laboratory experiment was conducted with 87 participants who were assigned to one of three conditions: HMD-360, Monitor-360, or Monitor-article, representing three distinct levels of technological immersion. All three conditions represented the same base content, with high and mid-immersion featuring a panoramic 360-video and low-immersion presenting an article composed of a transcript and video stills.
The study found that presence could be considered a composite of Involvement, Naturalness, Location, and Distraction. Mid- and high-immersion conditions elicited both higher Involvement and higher Distraction compared to low immersion. Furthermore, the participants’ propensity for psychological immersion maximized the effects of technological immersion, but only through the aspect of Involvement. In conclusion, the study sheds light on how different technologies used to present immersive media affect the experience of presence and suggests that higher technological immersiveness does not necessarily result in a higher reported presence.
Enhancing Engagement through Digital Cultural Heritage: A Case Study about Senior Citizens using a Virtual Reality Museum
As the use of Virtual Reality (VR) increases, museums have been using it to create simulations of their artefact collections. However, the level of accessibility, inclusiveness, and engagement of these simulations with senior citizens has been understudied. To address the problem, this case study presents the design of the “Pop-up VR Museum”, a VR experience based on cultural heritage artefacts from the Design Museum in Helsinki that attempts to engage with audiences of wide age ranges. Users can interact with virtual artefacts and listen to stories contributed by different communities. The Pop-up VR Museum has been tested with 254 users at the museum and taken to several elderly care homes. Evaluation is based on users’ gameplay data and their responses to post-experience questionnaires. Results indicate some variation in types of engagement based on users’ age groups. Despite potential limitations, this study provides valuable insights for other museums to create inclusive VR experiences.
A Dataset of Gaze and Mouse Patterns in the Context of Facial Expression Recognition
Facial expression recognition is an important and challenging task for both the computer vision and affective computing communities, and even more specifically in the context of multimedia applications, where audience understanding is of particular interest. Recent data-oriented approaches have created the need for large-scale annotated datasets. However, most existing datasets present some weaknesses, because of the collecting methods used. In order to further highlight these issues, we investigate in this work how human visual attention is deployed when performing a facial expression recognition task. To do so, we carried out several complementary experiments, using the eye-tracking technology, as well as the BubbleView metaphor, both under laboratory and crowdsourcing settings. We show significant variations in gaze patterns depending on the emotion represented, but also on the difficulty of the task, i.e., whether the emotion is correctly recognised or not. Moreover, we use these results to propose recommendations on the ways to collect label data for facial expression recognition datasets.
SESSION: Work-in-Progress – New Forms of Interactive Media Experiences: Immersion, Tangibility, Multisensoriality and Accessibility
Zenctuary VR: Simulating Nature in an Interactive Virtual Reality Application: Description of the design process of creating a garden in Virtual Reality with the aim of testing its restorative effects.
In this paper we present the design process of a virtual reality experience the aim of which is to have a restorative effect on users. In the simulated natural site, the user can interact with some elements of the environment and can also explore the view. We describe how we tried to create a more realistic sense of nature by relying on high quality graphics, the use of free-roaming space, and naturalistic interactions. During the design process we avoided gameful interactions and instead created playful interactions, while also relying on the multimodal aspect of the virtual reality technology.
A Quality of Experience Evaluation of an Interactive Multisensory 2.5D Virtual Reality Art Exhibit
In recent years, museums have become more interactive and immersive through the adaptation of technology within large scale art exhibitions. Due to these changes, new types of cultural experiences are more appealing to a younger audience. Despite these positive changes, some museum experiences are still primarily focused on visual art experiences, which remain out of reach to those with visual impairments. Such unimodal and visual dominated experiences restrict these users who depend on sensory feedback to experience the world around them.
In this paper, the authors propose a novel VR experience which incorporates multisensory technologies. It allows individuals to engage and interact with a visual artwork museum experience presented as a fully immersive VR environment. Users can interact with virtual paintings and trigger sensory zones which deliver multisensory feedback to the user. These sensory zones are unique to each painting, presenting thematic audio and smells, custom haptic feedback to feel the artwork, and lastly air, light and thermal changes in an effort to engage those with visual impairments.
ICAMUS: Evaluation Criteria of an Interactive Multisensory Authoring Tool
This paper presents the advancement of the PRIM project that aims at giving the power to non-computer experts to create digital and interactive scenarios. For this purpose, we explored the strengths and limitations of the visual programming languages and the mulsemedia editors chosen for their ease of use. Results led to a criteria list that our final solution should meet, named the ICAMUS criteria for Interface, Combinatorics, Affordance, Modularity, Ubiquity, and Synoptic. This paper proposes a scale based on the ICAMUS criteria that may assess the Interactive and Multisensory Authoring Tools (IMAT scale). Last, this paper discusses how to compute a score based on three metrics (presence/absence of elements, number of clicks to do an action, and time needed to do this action) and the visual representation of this score that has to give a complete profile of the tool. We hypothesize that this scale will be able to highlight the complementarities of visual programming languages and mulsemedia editors as well as the challenges to face.
Designing an Assistance Tool for Analyzing and Modeling Trainer Activity in Professional Training Through Simulation
Human audience analysis is crucial in numerous domains to understand people’s behavior based on their knowledge and environment. In this paper we focus on simulation-based training which has become a popular teaching approach that requires a trainer able to manage a lot of data at the same time. We present tools that are currently being developed to help trainers from two different fields: training for practical teaching gestures and training in civil defense. In this sense, three technological blocks are built to collect and analyze data about trainers’ and learners’ gestures, gaze, speech and movements. The paper also discusses the future work planned for this project, including the integration of the framework into the Noldus system and its use in civil security training. Overall, the article highlights the potential of technology to improve simulation-based training and provides a roadmap for future development.
A Social Awareness Interface for Helping Immigrants Maintain Connections to Their Families and Cultural Roots: The Case of Venezuelan Immigrants
International migration forces people into an unfamiliar reality in which their customs and values lose relevance. Moreover, former relationships are left behind, which makes immigrants more likely to experience loneliness. This study focuses particularly on Venezuelan immigrants by incorporating cultural aspects into a solution aimed at reducing loneliness and increasing social connectedness. Among Venezuelans, coffee is a staple of their daily routine and their favorite social beverage. We propose KEPEIN, a coffee maker-shaped interface to transfer a sense of presence and share coffee over distance. Through an experimental study, we evaluated the user’s perception and reaction when communicating through the interface. The results show potential added value to communication by including KEPEIN in a traditional remote interaction scenario. We discuss the benefits and limitations of this type of tangible communication interface and the importance of incorporating culture into the design of solutions for immigrants.
Enhancing VR Gaming Experience using Computational Attention Models and Eye-Tracking
This study explores the potential of enhancing interaction experiences, such as virtual reality (VR) games, through the use of computational attention models. Our proposed approach utilizes a saliency map generated by attention models to dynamically adjust game difficulty levels and to help in the game level design, resulting in a more immersive and engaging experience for users. To inform the development of this approach, we present an experimental setup that is able tp collect data in a VR environment and intends to be able to validate the adaptation of attention models to this domain. Through this work, we aim to create a framework for VR game design that leverages attention models to offer a new level of immersion and engagement for users. We believe our contributions have significant potential to enhance VR experiences and advance the field of game design.
Audio Augmented Reality Outdoors
Audio Augmented Reality (AAR) is a novel and unexplored area of AR representing the augmentation of reality with auditory as well as visual content. AAR’s interaction affordances as well as accurate real-world registration of visual as well as sound elements are challenging issues, especially in noisy, bright and busy environments outdoors. This paper presents a novel, mobile AAR experience that is deployed in a city environment while walking past six archaeological excavation sites in the city of Chania, Crete, Greece. The proposed AAR experience utilizes cutting-edge gamification techniques, non-linear storytelling, precise AR visualization and spatial audio, offering innovative AAR interaction while exploring the city’s archaeological sites and history, outdoors.
A VR Intervention Based on Social Story™ to Develop Social Skills in Children with ASD
Social interactions and communication play a crucial role in people’s lives. Those with autism spectrum disorder (ASD), especially children, may have difficulties participating in social interactions. Such challenges can be characterised by displaying atypical behaviours and limited sharing intention in social settings. Sharing is an important part of social interaction, and a lack of awareness or limited willingness to share undermines the development of social skills. These characteristics may be related to the impaired theory of mind (ToM). This means that it is difficult to understand people’s wishes and feelings. A range of interventions have been created to help develop social communication skills. The Social Story™ intervention is one such example, and it provides clear visual narratives to explain social situations and concepts to help children with ASD. The narratives provide a mechanism to visually communicate typical communication behaviours. The social story intervention approach is book-based. As such, it is dependent on a reader to communicate well the concepts and demands a certain level with respect to the listener’s imagination capacity. With the limitation of the paper-based medium in mind, this work-in-progress paper outlines the steps, approach, and end application to translate the Social Story™ into a virtual reality (VR) experience. The Social Story™ experience in VR potentially offers a more interactive, immersive and flexible intervention.
Proof-of-Concept Study to Evaluate the Impact of Spatial Audio on Social Presence and User Behavior in Multi-Modal VR Communication
This paper presents a proof-of-concept study conducted to analyze the effect of simple diotic vs. spatial, position-dynamic binaural synthesis on social presence in VR, in comparison with face-to-face communication in the real world, for a sample two-party scenario. A conversational task with shared visual reference was realized. The collected data includes questionnaires for direct assessment, tracking data, and audio and video recordings of the individual participants’ sessions for indirect evaluation. While tendencies for improvements with binaural over diotic presentation can be observed, no significant difference in social presence was found for the considered scenario. The gestural analysis revealed that participants used the same amount and type of gestures in face-to-face as in VR, highlighting the importance of non-verbal behavior in communication. As part of the research, an end-to-end framework for conducting communication studies and analysis has been developed.
A Preliminary Study of the Eye Tracker in the Meta Quest Pro
This paper presents the preliminary results of an accuracy testing of the Meta Quest Pro’s eye tracker. We conducted user testing to evaluate the spatial accuracy, spatial precision and subjective performance under head-free and head-restrained conditions. Our measurements indicated an average accuracy of 1.652° with a precision of 0.699° (standard deviation) and 0.849° (root mean square) for a visual field spanning 15° during head-free. The signal quality of Quest Pro’s eye-tracker is comparable to existing AR/VR eye-tracking headsets. Notably, careful considerations are required when designing the size of scene objects, mapping areas of interest, and determining the interaction flow. Researchers should also be cautious about interpreting the fixation results when multiple targets are within close proximity. Further investigation and better specification information transparency are needed to establish its capabilities and limitations.
Identifying the Developmental Challenges of Creating Virtual Reality Theatre: This paper reports on the identified challenges of creating virtual reality theatre from the practitioner’s perspective.
Virtual Reality Theatre continues to grow as a form of digital creative output for theatre practitioners. However, understanding the common challenges faced during the development of productions by practitioners and the barriers to entry produced by the complexity of platforms is under-researched. This paper provides an in-depth analysis of several challenges identified through semi-structured interviews of practitioners and a thematic review.
SESSION: Work-in-Progress – Artificial Intelligence and Interactive Media Experiences
On Legal and Ethical Challenges of Automatic Facial Expression Recognition: An Exploratory Study
Automatic facial expression recognition (FER) has a lot of potential applications. However, even if it can be beneficial for some areas, e.g. security and healthcare, several legal and ethical challenges arise. In this article, we first present such challenges related to the deployment of FER. Then, we introduce the conduct of a focus group which allowed to highlight interesting points regarding the use of FER in a medical context. Particularly, transparency, data management, diagnoses, liability, best endeavours obligation, and non-discrimination principle are debated. We finally discuss on our study’s limitations and directions for future work.
Feedback Driven Multi Stereo Vision System for Real-Time Event Analysis
2D cameras are often used in interactive systems. Other systems like gaming consoles provide more powerful 3D cameras for short range depth sensing. Overall, these cameras are not reliable in large, complex environments. In this work, we propose a 3D stereo vision based pipeline for interactive systems, that is able to handle both ordinary and sensitive applications, through robust scene understanding. We explore the fusion of multiple 3D cameras to do full scene reconstruction, which allows for preforming a wide range of tasks, like event recognition, subject tracking, and notification. Using possible feedback approaches, the system can receive data from the subjects present in the environment, to learn to make better decisions, or to adapt to completely new environments. Throughout the paper, we introduce the pipeline and explain our preliminary experimentation and results. Finally, we draw the roadmap for the next steps that need to be taken, in order to get this pipeline into production.
LLM-Based Interaction for Content Generation: A Case Study on the Perception of Employees in an IT Department
In the past years, AI has seen many advances in the field of NLP. This has led to the emergence of LLMs, such as the now famous GPT-3.5, which revolutionise the way humans can access or generate content. Current studies on LLM-based generative tools are mainly interested in the performance of such tools in generating relevant content (code, text or image). However, ethical concerns related to the design and use of generative tools seem to be growing, impacting the public acceptability for specific tasks. This paper presents a questionnaire survey to identify the intention to use generative tools by employees of an IT company in the context of their work. This survey is based on empirical models measuring intention to use (TAM by Davis, 1989, and UTAUT2 by Venkatesh and al., 2008). Our results indicate a rather average acceptability of generative tools, although the more useful the tool is perceived to be, the higher the intention to use seems to be. Furthermore, our analyses suggest that the frequency of use of generative tools is likely to be a key factor in understanding how employees perceive these tools in the context of their work. Following on from this work, we plan to investigate the nature of the requests that may be made to these tools by specific audiences.
Validating Objective Evaluation Metric: Is Fréchet Motion Distance able to Capture Foot Skating Artifacts ?
Automatically generating character motion is one of the technologies required for virtual reality, graphics, and robotics. Motion synthesis with deep learning is an emerging research topic. A key component of the development of such an algorithm involves the design of a proper objective metric to evaluate the quality and diversity of the synthesized motion dataset, two key factors of the performance of generative models. The Fréchet distance is nowadays a common method to assess this performance. In the motion generation field, the validation of such evaluation methods relies on the computation of the Fréchet distance between embeddings of the ground truth dataset and motion samples polluted by synthetic noise to mimic the artifacts produced by generative algorithms. However, the synthetic noise degradation does not fully represent motion perturbations that are commonly perceived. One of these artifacts is foot skating: the unnatural foot slides on the ground during locomotion. In this work-in-progress paper, we tested how well the Fréchet Motion Distance (FMD), which was proposed in previous works, is able to measure foot skating artifacts, and we found that FMD is not able to measure efficiently the intensity of the skating degradation.
Developing an Interactive Agent for Blind and Visually Impaired People
The aim of this project is to create an interactive assistant that incorporates different assistive features for blind and visually impaired people. The assistant might incorporate screen readers, magnifiers, voice synthesis, OCR, GPS, face recognition, and object recognition among other tools. Recently, the work done by OpenAI and Be My Eyes with the implementation of GPT-4 is comparable to the aim of this project. It shows the development of an interactive assistant has become simpler due to recent developments in large language models. However, older methods like named entity recognition and intent classification are still valuable to build lightweight assistants. A hybrid solution combining both methods seems possible, would help to reduce the computational cost of the assistant, and would facilitate the data collection process. Despite being more complex to implement in a multilingual and multimodal context, a hybrid solution has the potential to be used offline and to consume less resources.
Generating Utterances for Companion Robots using Television Program Subtitles
This study presents a method for generating utterances for companion robots that watch TV with people, using TV program subtitles. To enable the robot to automatically generate relevant utterances while watching TV, we created a dataset of approximately 12,000 utterances that were manually added to the collected TV subtitles. Using this dataset, we fine-tuned a large-scale language model to construct an utterance generation model. The proposed model generates utterances based on multiple keywords extracted from the subtitles as topics, while also taking into account the context of the subtitles by inputting them. The evaluation of the generated utterances revealed that approximately 88% of the sentences were natural Japanese, and approximately 75% were relevant and natural in the context of the TV program. Moreover, approximately 99% of the sentences contained the extracted keywords, indicating that our proposed method can generate diverse and contextually appropriate utterances containing the targeted topics. These findings provide evidence of the effectiveness of our approach in generating natural utterances for companion robots that watch TV with people.
The Green Notebook – A Co-Creativity Partner for Facilitating Sustainability Reflection
AI is becoming increasingly popular in artistic work. Yet tools for calculating environmental impact of AI are more adapted for other contexts than creative practices, making them sometimes hard to comprehend for the non-expert. In this study, based on interviews with AI artists, a design artifact called The Green Notebook was developed: a physical notebook where the AI artist could discuss ideas and receive feedback of their expected environmental impact. The conversational experience between the artist and the interface was informed by online content analysis of artistic work processes. The Notebook was explored and assessed with five artists in Wizard-of-Oz and focus group studies. Generally, the participants found a co-creation process with the enhanced ability to reflect on sustainability an accessible way to engage with sustainability considerations of their AI artistic practices. We provide insights of the Notebook’s perceived role and the conversational strategies used by the artists. Furthermore, we discuss trade-offs between politeness vs. efficiency and focus vs. integration to inform future research.
Towards the Creation of Scalable Tools for automatic Quality of Experience Evaluation and a Multi-Purpose Dataset for Affective Computing
Traditional tools used to evaluate the Quality of Experience (QoE) of users after browsing an ad, using a product, or performing any kind of task typically involves surveys, user testing, and analytics. However, these methods provide limited insights and have limitations due to the need of users’ active cooperation and sincerity, the long testing time, the high cost, and the limited scalability. On this work we present the tools we are developing to automatically evaluate QoE in different use cases such as dashboards that show on real time reactions to different events in the form of emotions and affections predicted by different models based on physiological data. To develop these tools, we require datasets on affective computing. We highlight some limitations of the available ones, the difficulties during the creation of such data, and our current work in the confection of a new one with automatic annotation of ground truth.
Enhancing Arabic Content Generation with Prompt Augmentation Using Integrated GPT and Text-to-Image Models
With the current and continuous advancements in the field of text-to-image modeling, it has become critical to design prompts that make the best of these model capabilities and guides them to generate the most desirable images, and thus the field of prompt engineering has emerged. Here, we study a method to use prompt engineering to enhance text-to-image model representation of the Arabic culture. This work proposes a simple, novel approach for prompt engineering that uses the domain knowledge of a state-of-the-art language model, GPT, to perform the task of prompt augmentation, where a simple, initial prompt is used to generate multiple, more detailed prompts related to the Arabic culture from multiple categories through a GPT model through a process known as in-context learning. The augmented prompts are then used to generate images enhanced for the Arabic culture. We perform multiple experiments with a number of participants to evaluate the performance of the proposed method, which shows promising results, specially for generating prompts that are more inclusive of the different Arabic countries and with a wider variety in terms of image subjects, where we find that our proposed method generates image with more variety 85 % of the time and are more inclusive of the Arabic countries more than 72.66 % of the time, compared to the direct approach.
Zero-shot virtual product placement in videos
Virtual Product Placement (VPP) is an advertising technique that digitally places branded objects into movie or TV show scenes. Despite being a billion-dollar industry, current ad rendering techniques are time-consuming, costly, and executed manually with the help of visual effects (VFX) artists. In this paper, we present a fully automated and generalized framework for placing 2D ads in any linear TV cooking show captured using a single-view camera with minimal camera movements. The framework detects empty spaces, understands the kitchen scene, handles occlusion, renders ambient lighting, and tracks ads. Our framework without requiring access to full video or production camera configuration reduces the time and cost associated with manual post-production ad rendering techniques, enabling brands to reach consumers seamlessly while preserving the continuity of their viewing experience.
SESSION: Work-in-Progress – Future of TV and Video Content Experiences
Subjective Test Environments: A Multifaceted Examination of Their Impact on Test Results
Quality of Experience (QoE) in video streaming scenarios is significantly affected by the viewing environment and display device. Understanding and measuring the impact of these settings on QoE can help develop viewing environment-aware metrics and improve the efficiency of video streaming services. In this ongoing work, we conducted a subjective study in both laboratory and home settings using the same content and design to measure QoE in Degradation Category Rating (DCR). We first analyzed subject inconsistency and confidence intervals of the Mean Opinion Scores (MOS) between the two settings. We then used statistical models such as ANOVA and t-test to analyze the differences in subjective tests on video quality between the two viewing environments. Additionally, we employed the Eliminated-By-Aspects (EBA) model to quantify the influence of different settings on the measured QoE. We conclude with several research questions that could be further explored to better understand the impact of the viewing environment on QoE.
Tap or Swipe? Effects of Interaction Gestures for Retrieval of Match Statistics via Second Screen on Watching Soccer on TV
Accessing match statistics through second screen while watching soccer matches on TV has grown into a popular practice. Although early works have shown how gestures on touch screens performed under distracting environments, little is known regarding how specific gestures (swiping and tapping) to retrieve information on second screen affect the viewing experience of soccer games on TV. For this, a mixed-method user study, which included prototype tests of watching short clips of a soccer match, questionnaires and short interviews, was conducted with 28 participants. The results revealed that the number of people who preferred tapping was more than the number of people who favored swiping under two different second screen activity time scenarios i.e. On-Play or Off-Play. However, neither swiping nor tapping yield better performance of recalling verbatim match stats and exact comparisons in both On-Play and Off-Play. Participant evaluations in On-Play and interviews give us clues regarding such difference.
Interconnecting Personal assistants and TVs: a friendly approach to connect generations
Technology has been a link between people, and the use of cell phones, tablets and other devices contribute to reduce distances. However, while these devices can unite, they can also result in exclusion. That’s what happens to a lot of seniors. This condition makes social isolation in this age group one of the biggest world’s problems. Due to this reason, thinking of solutions aimed at the social integration of this public is extremely important to reduce the indicators of loneliness. In this study, we used a system that combines an Intelligent Personal Assistant (IPA) and interactive television (iTV) to verify if this type of approach can facilitate and promote the realization of audio calls to friends, family and caregivers. The prototype was used for seven days in a real context with people between 64 and 90 years old. In general, the acceptance of the system was quite positive and changed the routine of the participants in some way during the testing period.
Proactivity in the TV Context: Understanding the Relevance and Characteristics of Proactive Behaviours in Voice Assistants
In the context of Television (TV), intelligent voice assistants are still mainly based on reactive behaviours, only responding to users’ requests. However, the addition of proactivity in TV assistants could have the potential to reduce the user-assistant interaction effort and bring an additional “human” layer, leading to a more organic adoption of these systems. Within this framework, this paper aims to contribute to the understanding of the relevance of a set of proactive scenarios for TV and get clues on how to design these behaviours to be efficient and avoid intrusion feelings. To this end, focus groups were conducted to discuss the conceptualised scenarios. The results showed that most of the idealized scenarios were relevant for the participants and that they will be positive if some characteristics can be pre-configured regarding the type of communication and intrusion of proactive suggestions.
Video Consumption in Context: Influence of Data Plan Consumption on QoE
User expectations are one of the main factors on providing satisfactory QoE for streaming service providers. Measuring acceptability and annoyance of video content, therefore, provide a valuable insight when measured under a given context. In this ongoing work, we measure video QoE in terms of acceptability and annoyance for the remaining data in a mobile data plan context.. We show that simple logos can be used during the experiment to prompt the context to subjects and the different context levels may impact the user expectations and consequently their satisfactions. Finally, we show that objective metrics can be used to determine the acceptability and annoyance thresholds for a given context.
Navigating Full-Motion Video: Emerging Design Patterns for Parameterized Replay Stories
The 1980s saw full-motion video (FMV) titles inaugurate design patterns for a new genre of interactive digital narrative (IDN). Recently, this genre made a resurgence in popularity. Despite the intervening years, FMV’s design conventions remain tightly coupled to the affordances of laserdisc technology. This paper employs IDN affordances and aesthetics as a lens to examine modern FMV games, namely the recent works of Wales Interactive. Also, this paper leverages research on the emerging conventions of Timeline–an authoring platform for tightly parallel, parameterized stories–to address the challenges of FMV design.
Enabling and Understanding Interactive Social VR360 Video Viewing
This paper reports on the research being done towards enabling and understanding interactive social VR360 video viewing scenarios, by exclusively relying on web-based technologies, and using different types of consumption devices. After motivating the relevance of the research topic and associated impact, the paper elaborates on key requirements, features, and system components to effectively enable such scenarios, such as: adaptive and low-latency streaming, media synchronization, social presence, interaction channels, and assistive methods. For each of these features and components, different alternatives are assessed and proof of concept implementations are being provided. With an effective combination and integration of all these contributions, an end-to-end platform can be built and used as a research framework to explore the applicability and potential benefits of social VR360viewing in a variety of use cases, like education, culture or surveillance, by tailoring the technological components based on lessons learned from experimental studies. These use case studies can also provide relevant insights into activity patterns, behaviors, and preferences in Social Viewing scenarios.
Enhancing Emotional Awareness and Regulation in Movies and Music Based on Personality
Music and movies are powerful art forms that elicit deep feelings and emotions and help us reflect on our and other people’s lives on subjects such as: dreams, mental states, routines, society and culture. The evolution of technology has been easing access to these forms of entertainment and education for everyone, everywhere in the world. Given the easy and frequent interaction with a huge amount of movies and music daily, and the impact of these in our emotions, it becomes more and more relevant to address and think of ways to augment people’s emotional perception and awareness of multimedia content in and through movies and music. In this paper, we present the motivation and describe the background for these challenges, and propose an approach for the design, development and expansion of interactive features that allow users to visualize and access emotions felt while engaging with movies and music. A special focus is put on the content in these forms of entertainment that have in some way meant something or can be associated with a significant memory, providing insights and helping to manage and regulate emotions, allowing to revisit content with an increased awareness or even recommend new content, taking into account users’ personality.
Survey on the Impact of Listening to Audio for Adaptive Japanese Subtitles and Captions Ruby
Subtitles and closed captions, which are prepared for hearing-impaired users, are now widely used by users without hearing concerns. In this paper, we focus on the adaptation of subtitles and captions for non-hearing-impaired users, particularly the adaptation of the kanji ruby. From our experiments on non-hearing-impaired adults, Welch’s t-test was used to clarify whether listening to audio with the same content affects the necessity of kanji ruby. In addition, we proposed and evaluated an adaptive model to predict whether ruby should be added to kanji captions based on the experimental results. The experimental results suggest that not only the difficulty of the kanji and the user’s kanji ability, but also the content of the audio is important for the optimization of kanji ruby.
SESSION: IMX Demos
Sensorial Immersive Experiences using MPEG Haptic and Scene Description Standards
This paper presents a demonstration showcasing the capabilities of the upcoming MPEG-I standards on Haptics and Scene Description. The demonstration is a sensorial immersive experience using a virtual reality headset, haptic-enabled controllers and a haptic vest. The proposed experience is an interactive go-kart race-like game where interactions with the environment will trigger spatialized haptic feedback to provide an enhanced sense of immersion. By implementing the MPEG-I standards on Haptics and Scene Description, and by designing this demonstration using exclusively these standards, this work showcases part of the research and standardization efforts produced through the MPEG ecosystem.
ScenaConnect: an original device to enhance experiences with multisensoriality
This demonstration aims at presenting ScenaConnect, a multisensory device which allows people to live various several multisensory experiences. ScenaConnect is inexpensive, compact, easy to install and allows to improve experiences in added new interactions. The demonstration will present two cases of use. The first one is an interactive math exercise and the second one is a multisensory experience that will take the visitor on a journey through history. Moreover, ScenaConnect could be used in museums for immersive and interactive experiences or by a teacher who can use it to make the learning of his students more interactive and adapted. The perspectives are to allows non-expert in computer science to quickly integrate ScenaConnect in several and various experiences thanks to the software ScenaProd, which is, like ScenaConnect, a goal of the PRIM project presented in more detail on this paper.
An AR Game for Primary Learners to Safeguard Intangible Cultural Heritage of the Ovahimba Tribe
Augmented Reality (AR) is a new technology that enhances the actual world by superimposing computer-generated or extracted real-world sensory data such as images or sound onto it. Incorporating Augmented Reality into Cultural Heritage has a slew of benefits. Safeguarding of Cultural Heritage is vital because it serves as a link between the past and the present. Museums do not engage their audiences, particularly primary school-aged children, and as a result, youngsters show little interest in their cultural history, and museums cannot compete with more technologically advanced and modern kinds of entertainment. The purpose of this study was to spark children’s interest in Cultural Heritage as they engage with it. Under the project, a Cultural Heritage game using eight Ovahimba items was developed and tested at a local primary school. Research-by-Design was the primary methodology of this research project.
Use of immersive and interactive systems to objectively and subjectively characterize user experience in work places
This demo’s objective is to display how simulated environments can be used in the evaluation of work-related indoor environments compared to physical environments. In fact, in indoor environment evaluations, Virtual Reality (VR) offers new possibilities for experimental design as well as a functional rapprochement between the laboratory and real life. With VR, environmental parameters (e.g light, color, furniture…) can be easily manipulated at reasonable costs, allowing to control and guide the user’s sensorial experience. One main challenge is to acknowledge to which extent simulated environments are ecologically valid and which functions would be more solicited in different environmental-simulation display formats. User-centric evaluation and sensory analysis in the building sector is in its beginning; this new tool could be of benefit for the building sector, on one hand for methodological facilitation purposes and on the other for cost reductions. In order to achieve the objectives of this project, a first step is to develop and validate the indoor simulations. In environmental simulations, one of the most used formats, for its visual realism and ease of use are 360° panoramic photos and videos, which permits capturing physical-world images. In an objective of validation of the format, 360° photos of workplaces were taken in the building of Halle 6 Ouest of Nantes University and an immersive and interactive test based on physiological indicators to subjectively and objectively assess comfort and performances in work offices was developed. The demo will comprise a head-mounted display with integrated eye-tracking and the measure of electrodermal activity, heart rate and galvanic skin response.
GoNature AR: Air Quality & Noise Visualization Through a Multimodal and Interactive Augmented Reality Experience
As Extended Reality (XR) media experiences become a commodity, there is a unique opportunity to deploy XR for environmental awareness. Interaction challenges of Augmented Reality (AR) still exist, focused on limited gesture and head tracking affordances. AR technologies should also be seamlessly integrated with sensor data, analytics and ultimately status prediction, to be visualized in an AR experience, rather than merely superimposing visuals onto the real world. This paper presents an innovative, work-in-progress, multimodal AR experience integrating interactive narration, gestures, hands recognition and voice commands, while a citizen is wearing a head-worn AR display, promoting environmental awareness, health, and wellness. By combining AR technologies and a sensor network, GoNature AR provides citizen awareness of real-time, multimodal, air and noise pollution data.
HoloBrand: A Co-located XR Multiplayer Serious Game on an Economic Model
Extended reality (XR) and serious games are considered promising approaches for modern teaching and learning concepts. However, they often do not exploit the advantages that co-located experiences could provide (e.g., more immediate exchange of ideas and a sense of community). One reason is the high technical, design, and didactic requirements for such solutions. In this work, we introduce HoloBrand, a co-located XR multiplayer serious game for HoloLens 2. The game can be played at the exhibit with one to three players in a time frame from three minutes on upwards. With the game, we enable students to experience the dynamics of an economic model on mass market products (Urban’s perceptor model). Prior knowledge is not required. We describe the fundamental conceptual and implementation aspects of the HoloBrand system and game and interaction design. We include the implementation of the triadic game design concept and the four-factor fun model.
DatAR: Comparing Relationships between Brain Regions and Diseases: An Immersive AR Tool for Neuroscientists
Different brain diseases are associated with distinct patterns of affected brain regions. Neuroscientists aim to understand the variability of these patterns across different brain diseases. We used a user-centered design approach to develop support for neuroscientists in exploring and comparing which regions are affected by which diseases based on neuroscience literature. Our study involved six neuroscientists and nine visualization experts who evaluated the usability and explainability of our visualization tool for comparing disease patterns in brain regions. Our results show that our tool can help neuroscientists explore and visualize relationships between brain regions and diseases, and compare patterns of affected regions in an immersive AR environment.
InnovART2 : À l’écoute des chantiers: An industrial heritage sound-and-augmented-reality walking tour
À l’écoute des chantiers is an immersive self-guided walking tour, a sound and visual exploration of the industrial heritage of Nantes’ Parc des Chantiers – with ten sound capsules (storytelling, interviews with former shipyard workers, and sound design) and two site-specific large-scale augmented reality projections of the transborder ferry bridge (dismantled in 1958) and the construction of the Brissac car ferry (1955). Accessible along Nantes’ popular green-line tourist path through signage on the ground, this itinerary describes, explains, and puts into perspective a landscape familiar to the people of Nantes, newcomers, as well as tourists, by giving keys to understanding a former industrial site undegoing rapid transformation and in the process of disappearing. The itinerary is available on the Nantes Patrimonia website, downloadable as a smartphone application or activated via QR codes present in the park. Part of Campus France’s InnovART2 research project, À l’écoute des chantiers was part of the Voyage à Nantes 2022 and an ongoing part of the Parcours des Écoles, and questions the limits of smart tourism applications in an era of increasing digital privacy concerns that reduce geo-location capabilities.
MiroAR: Ubiquitous AR Teleconferencing Through The Mirror
Video call systems rely on being able to capture and transmit a self view, while at the same time rendering the view of the other party. Due to the lack of inwards facing cameras in XR devices (AR Glasses, HMD etc.) this is not a straightforward process. As a solution, recent XR teleconferencing platforms are trying to create a more “immersive” experience by replacing the self view with avatars, placing 3D models in space, creating shared spaces and other engaging features; approaches that are quite demanding and even then do not create a “traditional” teleconferencing experience. In this work, we are using an XR device (AR Glasses, or smartphone) to create a seamless and natural video calling experience. By using the AR Glasses to record an existing self-view from a reflective surface, like a mirror, the user is able to easily conduct a video call with a party, even if they are using a different setup. To demonstrate this concept, we present the MiroAR application. We conclude this paper by discussing the roadmap, shortcomings and possible extensions of our work.
Kinetic particles : from human pose estimation to an immersive and interactive piece of art questionning thought-movement relationships.
Digital tools offer extensive solutions to explore novel interactive-art paradigms, by relying on various sensors to create installations and performances where the human activity can be captured, analysed and used to generate visual and sound universes in real-time. Deep learning approaches, including human detection and human pose estimation, constitute ideal human-art interaction mediums, as they allow automatic human gesture analysis, which can be directly used to produce the interactive piece of art. In this context, this paper presents an interactive work of art that explores the relationship between thought and movement by combining dance, philosophy, numerical arts, and deep learning. We present a novel system that combines a multi-camera setup to capture human movement, state-of-the-art human pose estimation models to automatically analyze this movement, and an immersive 180° projection system that projects a dynamic textual content that intuitively responds to the users’ behaviors. The demonstration being proposed consists of two parts. Firstly, a professional dancer will utilize the proposed setup to deliver a conference-show. Secondly, the audience will be given the opportunity to experiment and discover the potential of the proposed setup, which has been transformed into an interactive installation. This allows multiple spectators to engage simultaneously with clusters of words and letters extracted from the conference text.
SESSION: IMX Doctoral Consortium
Analysing and Understanding Embodied Interactions in Virtual Reality Systems
Virtual reality (VR) offers opportunities in human-computer interaction research, to embody users in immersive environments and observe how they interact with 3D scenarios under well-controlled environments. VR content has stronger influences on users physical and emotional states as compared to traditional 2D media, however, a fuller understanding of this kind of embodied interaction is currently limited by the extent to which attention and behaviour can be observed in a VR environment, and the accuracy at which these observations can be interpreted as, and mapped to, real-world interactions and intentions. This thesis aims at the creation of a system to help designers in the understanding of the embodied user experience in VR environment: how they feel, what is their intentions when interacting with a certain object, provide them guidance based on their needs and attention. Controlled guided scenarios will help to reduce the gap of perception between the designer building an experience, and the user living it, leading to more efficient behaviour analysis in VR systems.
Artificial Intelligence Techniques for Quality Assessments of Immersive Multimedia
Artificial Intelligence techniques are being applied in the quality assessment of immersive multimedia content, such as virtual and augmented reality scenarios. The immersive nature of these applications poses a unique challenge to traditional quality assessment methods. In fact, estimating user acceptance of immersive technologies is complex due to multiple aspects, such as usability, enjoyment, and cyber sickness. Artificial Intelligence-based approaches offer a promising solution to this problem, enabling objective evaluations of immersive multimedia such as spatial audios, point clouds, and light field images. This work presents an overview of different artificial intelligence techniques that have been used for quality assessments of immersive multimedia content, including machine learning algorithms, deep learning, and computer vision. The advantages of these techniques and some examples of practical application are provided. Future works are presented, underlining the possible outcomes of a Ph.D. study in this field.
Construction of immersive and interactive methodology based on physiological indicators to subjectively and objectively assess comfort and performances in work offices
The building sector and the indoor environment conception is undergoing major changes. There is a need to reconsider the way offices are built from a user’s centric point of view. Research has shown the influence of perceived comfort and satisfaction on performance in the workplace. By understanding how multi-sensory information is integrated into the nervous system and which environmental parameters influence the most perception, it could be possible to improve work environments. With the emergence of new virtual reality (VR) and augmented reality (AR) technologies, the collection and processing of sensory information is rapidly advancing, moving forward more dynamic aspects of sensory perception. Through simulated environments, environmental parameters can be easily manipulated at reasonable costs, allowing control and guiding the user’s sensory experience. Moreover, the effects of contextual and surrounding stimuli on users can be easily collected throughout the test, in the form of physiological and behavioral data. Through the use of indoor simulations, this doctoral research goal is to develop a multi-criteria comfort scale based on physiological indicators under performance constraints. In doing this, it would be possible to define new quality indicators combining the different physical factors adapted to the uses and space. In order to achieve the objectives of this project, the first step is to develop and validate an immersive and interactive methodology for the assessment of multisensory information on comfort and performance in work environments.
Human-Centered and AI-driven Generation of 6-DoF Extended Reality
In order to unlock the full potential of Extended Reality (XR) and its application to societal sectors such as health (e.g., training) or Industry 5.0 (e.g., remote control of infrastructure) there is a need for very realistic environments to enhance the presence of the user. However, current photo-realistic content generation methods (such as Light Fields) require a massive amount of data transmission (i.e., ultra-high bandwidths) and extreme computational power for displaying. Thus, they are not suited for interactive immersive and realistic applications. In this research, we hypothesize that is possible to generate realistic dynamic 3D environments by means of Deep Generative Networks. The work will consist of two parts: (1) a computer vision system that generates the 3D environment based on 2D images, and (2) a Human-Computer Interaction system (HCI) that predicts Region of Interest (RoI) for efficient 3D rendering, subjective and objective assessment of user perception (by means of presence) to enhance the 3D scene quality. This work aims to gain insights into how well deep generative methods can create realistic and immersive environments. This will significantly help future developments in realistic and immersive XR content creation.
Object-Based Access: Enhancing Accessibility with Data-Driven Media
Audiovisual media is an integral part of many people’s everyday lives. People with accessibility needs, especially people with complex accessibility needs, however, may face challenges accessing this content. This doctoral work addresses this problem by investigating how complex accessibility needs can be met by content personalisation by leveraging data-driven methods. To this end, I will collaborate with people with aphasia, a complex language impairment, as an exemplar community of people with complex accessibility needs. To better understand the needs of people with aphasia, I will use collaborative design techniques to meet the needs of end users. This will involve them in the design, development and evaluation of systems that demonstrate the benefits of content personalisation as an accessibility intervention. This paper outlines the background and motivation to this PhD, the work that has already been completed, and current planned future work.
Towards Distributed and Interactive Multi-cam and Multi-device VR360 Video Experiences
The production and consumption of multimedia content is continuously increasing, and this particularly affects to immersive formats, like VR360 video. Even though significant advances have been witnessed with regard to the processing, delivery and consumption of interactive VR360 video, key challenges and research questions still need to be addressed to efficiently provide interactive multi-camera and multi-user VR360 video services over distributed and heterogeneous environments. This research work aims at providing novel and efficient contributions to overcome existing limitations in this topic. First, it will develop an end-to-end modular web-based VR360 video platform, including the measurement of Quality of Service (QoS) and activity metrics, to be used as a research testbed. Second, it will provide lightweight yet efficient viewport-aware video processing and delivery strategies to dynamically concentrate the video resolution on the user’s viewport, with a single stream and decoding process through the web browser. Third, it will propose innovative encoding, signaling and synchronization solutions to enable an effective support for multi-camera and multi-device VR360 services, in a synchronized manner and with the lowest latency possible. Fourth, it will explore how to effectively provide social viewing scenarios between remote users while watching the same or related VR360 videos, assisted with interactive and guiding techniques.
Towards the Creation of Tools for Automatic Quality of Experience Evaluation with Focus on Interactive Virtual Environments
This paper contains the research proposal of Juan Antonio De Rus presented at the IMX 23 Doctoral Symposium. Virtual Reality (VR) applications are already used to support diverse tasks such as online meetings, education, or training, and the usages grow every year. To enrich the experience VR scenarios, include multimodal content (video, audio, text, synthetic content) and multi-sensory stimuli are typically included. Tools to evaluate the Quality of Experience (QoE) of such scenarios are needed. Traditional tools used to evaluate the QoE of users performing any kind of task typically involves surveys, user testing or analytics. However, these methods provide limited insights for our tasks with VR and have shortcomings and a limited scalability. In this doctoral study we have formulated a set of open research questions and objectives on which we plan to generate contributions and knowledge in the field of Affective Computing (AC) and Multimodal Interactive Virtual Environments. Hence, in this paper we present a set of tools we are developing to automatically evaluate QoE in different use cases. They include dashboards to monitor in real time reactions to different events in the form of emotions and affections predicted by different models based on physiological data, as well as the creation of a dataset for AC and its associated methodology.
Behavior as a Function of Video Quality in an Ecologically Valid Experiment
Most user studies in the QoE multimedia domain are done by asking users about quality. This approach has advantages: it obtains many answers and reduces variance by repeated measurements. However, the results obtained in this context may be different from those obtained in the real application, since quality is not asked about this often in everyday life. It is more natural is to focus on user behavior. The proposed PhD focuses on a method for performing experiments based on observations of a participant’s behavior. We address two main challenges that exist in any new experiment design: how to calculate the interval validity of the proposed method and how to analyze the obtained data. The data analysis we propose is based on psychometric functions. We propose two different experiments, one of which is already ongoing.
Evaluation of Media-Based Social Interactions in Virtual Environments
The evaluation of users’ experiences in virtual environments is an important task for researchers in the fields of human-computer interaction and extended reality. It can be used to understand and enhance the quality of users’ mediated interactions and communications. In a constantly evolving world, where people are growing with technology, it is important to understand, evaluate and enhance the use of immersive media. In the research agenda of this Ph.D. thesis, the challenge of developing multi-user experiences in virtual environments and setting evaluation metrics for researchers are considered. This Ph.D. thesis showcases an interest in how to enhance trust formation in media-based social environments. The findings of this Ph.D. are expected to help create new open-source tools to facilitate the understanding of individuals and groups in extended reality applications.
Physically-based Lighting of 3D Point Clouds for Quality Assessment
Point clouds are acknowledged as an essential data structure to represent 3D objects in many use cases, notably immersive experience settings such as Virtual, Augmented or Mixed Reality. This work is the first part of a research project on immersive Quality Assessment of point clouds in different lighting. In this report, I focus mainly on the physically-based rendering of such data in Unity 3D, and the impact of point cloud compression when considering various lighting conditions on the objects. These first observations and results will serve in the implementation of a 6DoF immersive experiment setting for subjective quality assessment.
AI-Human Collaboration for in Situ Interactive Exploration of Behaviours From Immersive Environment
Experiments in immersive environments allow the collection of large amounts of data that are closely related to individual behaviour. The recording of such experiments allows for the complex study of under-constrained tasks. That is, tasks that allow for a high degree of contingency in their resolution. This contingency allows for better discrimination of individual behaviour. However, the high complexity of the tasks makes them difficult to analyse.
My thesis aims to discuss the advantages of Immersive Analytics for analysing hybrid sequential data (trajectory and events) generated in immersive environments. The analysis needs to be performed at a very high level of abstraction due to the high contingency of behaviours extracted from immersive environments. The massive amount of data generated highlights the need to build a model that allows feature extraction at a high level of abstraction.
Since the exploration scheme is unknown in advance, the visualisations provided to the analyst should be highly interactive and adaptable to follow the analyst’s queries as he or she searches for new insights in the data.
Quality Assessment of Video Services in the Long Term
In traditional subjective video quality experiments, the presented sequences are short and quality ratings are based on a single interaction with a service (i.e., one session). However, in real-life scenarios, users interact with a video service for a longer period of time. If decisions are made, such as to abandon a service, they are formulated based on longitudinal multi-episodic interaction. Therefore, it is important to better understand how quality is perceived in a longer interaction and how quality perception is linked to behavioral implications. My PhD work encompasses a longitudinal study of users’ interactions with a video service using a mobile device. In our study, which consists of six phases, we use different study designs to investigate how users perceive quality in a more ecologically valid setting. The study is carried out using a previously validated setup, which consists of compression software and a mobile application.
Optimization and Evaluation of Emerging Codecs
Video streaming is growing exponentially. High-resolution videos require high bandwidth to transport the videos over the network. There is a great demand for compression technologies to compress video and maintain quality. Video codecs are used to encode and decode video streams. These codecs have been developed by MPEG, Google, Microsoft, and Apple Inc. The goal of this research is to develop a technology that will realize contribution transmission through connecting the latest methods generation of single and multi-way video encoding with the new protocols that will provide transmission reliability and keep low latency. A literature review will be carried out. The literature covers different video codecs and transmission techniques and the methods used to evaluate the quality of those techniques and codecs. Based on the literature review, the theoretical framework will be formed, and a video encoding method prototype will be developed. The developed method will be for one-way and multi-way software that will automatically optimize the settings of the video codec to set its operating conditions at optimal. The new transmission software will use newer codecs, such as H265/HEVC, VP9, and AV1, MPEG5, which will allow additional reduction of the bit stream and deliver secure, reliable, and quality video with low latency.
Objective Metrics Definition for QoE Assessment for Extended Reality Applications
The increasing advancement of Virtual and Augmented Reality technologies opens new research perspectives in various fields. While traditional multimedia content quality assessment has been extensively investigated, different issues need to be addressed for these novel technologies. In particular, evaluating the quality of experience must consider several aspects, involving, on the one hand, the quality of the displayed multimedia content and, on the other hand, human factors. Due to its inherent subjectivity, defining objective metrics for Quality of Experience is complex. This paper aims to frame the problem of objective Quality of Experience assessment and propose research directions to be pursued in this field.
Nothing Beside Remains: How might we use digital technologies and speculative design to explore the contested and hidden histories of heritage and museum artefacts?
An existing body of Human-Computer Interaction (HCI) work has had a focus on cultural heritage settings such as historic sites and museums where digital technologies are used to augment user experience through a variety of methods such as interactive displays, museum guides and mixed/virtual reality experiences. However, very little of this work has been used to explore some of the more contemporary and pressing issues surrounding museums and heritage today. This project, Nothing Beside Remains explores the nature of narratives and stories in the context of museums and heritage through the creation of tangible interactive digital artefacts. Specifically, it looks at ‘Contested Histories’ – a broad spectrum of issues relating to heritage sites and museum artefacts that in recent years have become more urgent to address. These can include issues such as heritage loss (e.g., through decay, destruction or theft either through nature or human conflict), unsustainable preservation and conservation practice (the continuing accumulation of historic artefacts, lack of financial and specialist resources in maintaining sites and artefacts) and repatriation (the disputed claims by nations in reclaiming artefacts that were looted often due to colonial activity). Design (and additionally Speculative Design) as an approach to these issues allows individuals or groups to imagine radically different futures or pose questions relating to phenomena through the convergence of creativity and critical theory – therefore this work aims to extend how this kind of critical enquiry can pose questions to museum and heritage sectors as well as other stakeholders, surrounding contentious narratives and propose possible futures. It involves the creation of physical digital artefacts and tests their effectiveness in creating debate and discussion surrounding these issues. In addition, it will look at the agency of museum and heritage organisations to address these problems and explore experimental design work as a vehicle for actionable change.