Phuoc Trinh - Head gaze sucks for input and we're fixing it

Head gaze sucks for input and we're fixing it

The prevalence of head gaze tracking in contemporary AR/VR devices offers intriguing potential for user interface interaction and virtual object engagement. However, this potential remains largely untapped, as standalone head gaze tracking falls short due to its inherent limitations. This article aims to showcase the transformative capacity of the SoundxVision ring in overcoming these constraints and unlocking the true power of head gaze tracking. To begin, we will explore the essence of an effective input system and delve into the symbiotic relationship between head tracking and gesture recognition.

Our extensive research and observations underscore the core elements a proficient input system must address:

- Object of Intent Tracking: The system should adeptly track the user's desired object of interaction—be it a UI element, virtual entity, or even a tangible object in the real world that they wish to engage with.

- Trigger Mechanism: Upon identifying the object of intent, a trigger mechanism becomes imperative to facilitate interaction. This could involve actions like tapping, pinching, pressing a button,…

- Navigation: Navigational gestures, such as scrolling and swiping on touch-sensitive surfaces or employing thumbsticks or pressing arrow buttons (left, right, up, down), are paramount. These gestures, commonplace in our interactions with graphical interfaces, retain their significance within XR devices.

A notable exemplar of a robust XR input system that addressed all these 3 points is the Apple Vision Pro, which harmoniously integrates eye gaze and hand tracking. Here, eye tracking (with machine learning to filter out noise) serves to identify points of interest, finger pinching acts as a trigger, and tossing a hand in the air for navigation. Head gaze tracking (when standalone), in the other hand, can only identify objects of intent but lacks a mechanism to initiate interaction. To circumvent this limitation, certain applications employing head gaze tracking necessitate sustained gaze contact with an object for a predefined duration to trigger further engagement, as shown in the GIF below by Microsoft on gaze and commit guideline on HoloLens. Regrettably, this method incurs responsiveness drawbacks and restricts the range of potential interactions, this is where SoundxVision ring can be used to complete the experience.

An example from Microsoft HoloLens design guideline for gaze and commit interaction, notice that user need to gaze at the object for approximately 2 seconds to select, this is very slow to begin with.

Gaze, TAP, swipe

Head gaze tracking when combined with SoundxVision gesture recognition ring can help users seamlessly interact with XR content, gaze to a desired object and use gesture to interact with it. For example, gaze at a picture and swipe (left to move to the next one, right to the previous), or simply gaze at an button and double tap to activate and deactivate, just like you already do on computers, very easy, isn’t it?

A demonstration of SoundxVision ring for interacting with UI in pass-through AR using head gaze as pointer, thumb double tap as trigger and thumb swiping for navigation

the BENEFITS

1. It’s widely available: Head tracking relies mostly on Inertial Measurement Unit (IMU), a sensor chip which is available on even the cheapest headsets and on your mobile phone, too, this chip is tiny and very power efficient. This means, our “Gaze, tap, swipe” can be implemented on a wide range of XR device, even those light weight smart glasses (39gr and less) where fitting eye and hand tracking sensors is a big challenge or sometimes not even possible.

2. Pointer can be implemented to interact with contents built for computers and mobiles (such as web and applications). The implementation of pointer is made possible because human head is very stable, unlike eye tracking, where pointer is often not available due to fast and unexpected movements of our eyes, which causes "Midas's touch" effect that happens when every object annoyingly change their appearance as user glance at it. This is mentioned by SkarredGhost when he tried attempts to bring Apple Vision Pro UI on Oculus Quest Pro with eye tracking.

3. Power efficiency: as mentioned above, the IMU sensor used for head tracking consume a little amount of energy (can be as low as 2.26mW on some specific IMU), the computing power used for processing head tracking data is also minimal when compare to computer vision based approach.

4. Put user's hand in rest, or anywhere: our approach to XR input is to use micro gestures which require the least amount of hand movement, so that even with the hand in your pocket (jacket, please not a tight jean), user can still operate the XR device.

5. Privacy: eye tracking heat map can reveal a lot about a user, even their unconscious mind, head tracking data, in the other hand, is more challenging to deal with as it does not provide pin point accuracy of user's points of interest.

Conclusion

The integration of head gaze tracking and gesture recognition through the SoundxVision ring establishes a robust and dependable method for interactions within the XR environment. By incorporating a trigger mechanism (tapping) and gesture-based navigation (swiping), the ring significantly enhances the responsiveness of the user experience in comparison to relying solely on head gaze. This approach holds the potential to be applied across a diverse array of XR devices, ensuring a consistently seamless and immersive experience.

Just like other interaction modals for XR, human physical constraints need to be considered when designed an application with head gaze tracking, for example the contents should be targetable without putting user head in awkward position, and UI elements such as buttons, sliders,... aligned in groups to avoid head movement. Microsoft provided good guidelines for head gaze based interaction here.