GVAIN’s member organized a workshop at Web3D 2024 on the topic of XR, LLM-based XR, and data compilation and evaluation strategy.
Title: “Towards Next Generation eXtended Reality: Data Compilation and Evaluation Strategies”
The workshop brought together leading researchers, industry experts, and XR practitioners to explore and discuss state-of-the-art approaches in three pivotal areas:
“LLM-Augmented Situational Awareness”, Muhammad Zeshan Afzal (DFKI)
This session will focus on the integration of Large Language Models (LLMs) to enhance situational awareness within XR environments. Discussions will include methodologies for compiling comprehensive datasets that enable LLMs to understand and respond to complex, dynamic scenarios. Evaluation strategies to measure the eRectiveness and reliability of LLMs in real-time situational analysis will also be highlighted.
“Multimodal Natural User-XR Interaction”, Oier Lopez de Lacalle (EHU), Arantza del Pozo (VICOM)
This session will delve into multimodal interaction techniques, emphasizing natural user interfaces that seamlessly blend various input modalities (e.g., voice, gesture, VR and AR data, etc.). Innovative data collection approaches that capture the intricacies of human behavior and interaction patterns will be presented. In addition, evaluation methods to assess the intuitiveness, responsiveness, and overall user experience of these multimodal systems will be examined.
“Responsive Realistic Avatars”, David Moreno (HHI)
The creation of realistic and responsive avatars is vital for immersive XR experiences. This session will cover cutting-edge approaches to compiling high-fidelity data on human motion, expressions, and speech. Attendees will gain insights into evaluation techniques designed to ensure avatars are not only visually accurate but also capable of interacting in a lifelike and contextually appropriate manner within XR environments.
“IVLinG – Virtual Portuguese Sign Language Interpreter”, Telmo Adao (CCG)
This talk will present an end-to-end system for Portuguese Sign Language (LGP) interpretation, utilizing technologies such as computer graphics, computer vision, and large language models (LLMs) to enable bidirectional communication between Deaf and hearing individuals, with a focus on tokenization, avatar-based sign display, and deep learning for gesture recognition.
“Volumetric Rendering”, Mikel Zorrilla (VICOMTECH)
XR media consumption creates new challenges for the evaluation of the user experience. QoE/QoS paradigms from traditional media are useful, but insuRicient in this context. This session will focus on the relevance of orchestration mechanisms that would be desirable to provide context-aware adaptive experiences, as well as on how to evaluate interactive XR media experiences. The capacity of measuring the experience at diRerent levels (human, interaction, service, and network) is crucial to first understand the experience, and then create the right orchestration mechanisms (also at diRerent levels) to ensure the user satisfaction in realistic conditions. This session will discuss how this cross-layer approach can be leveraged to provide next generation human-centred immersive, interactive and intelligent experiences.