The Current State and Evolution of the Mixed Reality Ecosystem for Industrial Simulation

Understanding the current adoption characteristics for mixed reality driven immersive learning, dissecting the gaps and identifying areas of growth.

Dec 20, 2023

Reiterating the effectiveness of Mixed Reality in Industrial Simulation

Following up on my previous article about spatial computing in enterprises, this piece focuses on immersive learning as a key application of Mixed Reality and how existing developer ecosystem solves for this problem. Immersive learning stands out for its ability to transform traditional training methods due to several factors:

Enhanced Operational Efficiency: Immersive learning significantly increases information retention by introducing spatial awareness in a 3D setting, combining visual and auditory elements. It's particularly effective for roles requiring spatial understanding and physical interaction with objects. Research and industrial case studies show that immersive learning participants retain operational details better than those in 2D or conventional training.
Reduced Training Time and Costs: This modality is ideal for high-turnover jobs where traditional onboarding is costly and complex. Immersive learning provides quick, comprehensive yet compressed training scenarios, beneficial for roles like last-mile delivery, which often experience high attrition rates.
Cost-Effective Asynchronous Training: Like digital training, immersive learning supports asynchronous formats, cutting the need for facilitator-led sessions. This is valuable in decentralized settings, reducing costs related to facilitation and the downtime of employees who would otherwise be working.

Let’s continue to explore key applications of immersive learning in various sectors:

Frontline Productivity: Common in logistics, manufacturing, retail, and healthcare, immersive learning is used for tasks like warehouse box stacking, store shelf arranging, and last-mile delivery. These simulations enhance productivity in these crucial areas thus boosting throughput.
Health & Safety: Building on frontline productivity, immersive learning is vital for training in health and safety procedures. It's particularly useful for scenarios that are rare, dangerous, or costly to replicate in real life, like bank robberies, workplace active shooter situations, and fire safety. Traditional methods like 2D trainings or skits are less effective compared to the immersive simulations that can prevent hazards and have significant financial implications.
Soft Skills & Customer Service: Traditional employee onboarding for customer service often involves 2D trainings or role-plays with trainers, which can be ineffective or expensive. Immersive learning offers a more effective solution by realistically simulating customer or employee interactions and enabling practice of soft skills in a spatially aware environment.

Understanding the gaps between existing customer demand and immersive content supply

While the ecosystem of immersive applications focussed on training as a use case is ever-growing due to its tangible ROI, there are some fundamental characteristics of how learning & development organizations adopt immersive learning and the nature of experiences that are built by the ecosystem.

General Acclimation & Navigation

Existing Problem: Most employees subject to immersive learning aged from 18 to 60 show varied familiarity with mixed reality. Younger employees often have gaming experience with similar technology, while older groups struggle with the novelty of wearing a mixed reality headset. This disparity can lead to discomfort or disorientation, especially when first using virtual reality. Navigating through a 3D UI is not intuitive for many enterprise users, requiring guidance such as quick tips or ghost animations to interact with 3D elements effectively.

Developer Painpoints Developers often create immersive learning experiences without standardizing the acclimation or navigation processes. This inconsistency means users must learn different systems for each new application, creating a need for additional ramp up on every marginal individual application workflow, thus creating friction in onboarding employees to a diverse set of training experiences.

Learning & Instructional Design

Existing Problem: Traditional enterprise learning content adheres to an industry-wide framework, emphasizing aspects like creative design, content flow, narrative, instructional format (visual and audio), behavioral modeling, and real-time feedback. While current immersive learning materials may boast high-quality assets and animations, they often miss the mark on instructional design and structured flow, which are key to transforming a good experience into an effective training module.

Developer Painpoint: Many game developers, primarily involved in creating immersive learning content, lack expertise in educational frameworks necessary for effective learning experiences. The gap in their knowledge, coupled with the absence of suitable tools or templates and lack of learning and development (L&D) resources, hinders their ability to create properly structured learning experiences. While SDKs equip them to create interactive experiences, they don't provide the means to author educationally focused content.

Controller Interactions

Existing Problem: The complexity of interactions with immersive learning can vary from simple & guided trainings with a linear playback and simple point and click interactions, to more advanced 6 degrees of freedom (DoF) trainings that warrant advanced controller interactions, thus requiring the user to leverage controller buttons such as trigger and grab buttons for translation and object manipulation. Knowing when to hold and leave such button combinations is especially cumbersome for a relatively less savvy employee demographic. This is accentuated for more complicated trainings that requires such interactions frequently frontline productivity and safety trainings. The friction leads to learners getting lost within the experience due to the nature of complexity and not knowing what button combination achieves the specific task at hand, thus creating churn and lack of ease.

Painpoint from Developer Ecosystem: Developers often rely on headset SDKs for controller functionality, but there's no universal standard for button mapping across different head-mounted display (HMD) manufacturers. This inconsistency results in varied controller mappings for similar actions, sometimes even inverting button uses. Consequently, each new immersive learning experience may require users to relearn controller mappings, adding a layer of complexity and hindering smooth adoption.

Here’s a demo from Oculus on controller walkthroughs which is one of the best demos that I’ve seen, however, it still is complicated to navigate:

Spatial Data

Existing Problem: Traditional 2D trainings provide limited depth in performance data, typically through methods like MCQs or free-form questions. In contrast, spatial training can track head, hand, and click movements, offering a more detailed understanding of user engagement. For example, it can assess how accurately a user turns a knob in a frontline training scenario, including their gaze direction. However, many existing immersive learning experiences lack the capability to capture such detailed data consistently.

Developer Painpoint: While headset SDKs provide access to detailed tracking data, developers often lack the tools to effectively analyze and utilize it. This includes tracking head, hand, and click movements, as well as object manipulation events. The absence of a standardized data abstraction and analytics framework leads to inconsistent or underutilized data analytics in immersive learning experiences. Many developers either don't leverage this data due to its complexity or do so inconsistently, failing to fully exploit the potential of spatial training analytics.

Here’s an example of how head gaze data can be helpful in an immersive training:

Skills Abstraction & Customization of Experiences

Existing Problem: Immersive learning spans a range of institutions, from technical schools to large corporations. Developers typically create foundational, concept-driven modules for common skills like welding, electrician training, lock-out-tag-out, pallet stacking, fork lift training, fire safety etc.. While these "Off-the-Shelf" modules are useful, particularly for trade schools and larger enterprises, the true effectiveness of immersive learning at scale requires customization. Tailored experiences need to reflect the specific assets, environments, and characters of the customer's environment, as well as adhere to their standard operating procedures (SOPs). Custom content creation is a detailed and time-consuming process, involving asset mapping and editing workflows to meet customer goals.

Developer Painpoint: Developers use a variety of tools, like Solidworks and Maya 3D, for asset creation. While emerging tools like Polycam and Luma employ AI and LiDAR for asset generation, the overall development lifecycle remains lengthy. The application landscape is fragmented, leading to substantial time spent on asset customization. Smaller development teams, with 5 to 50 developers, may lack the comprehensive skill set required for such detailed customizations.

Application Packaging & Distribution

Existing Problem: Unlike wellness or gaming applications, which are often a single, unified experience, immersive learning applications typically follow a course → module → experience structure. A single course might include multiple sequential experiences, leading to large, cumbersome applications due to increased size, complicating distribution. Packaging multiple experiences into one application significantly increases its size, making distribution more challenging.

Developer Painpoint: On Android, developers either bundle multiple experiences in one app or divide the course library into several smaller apps. Both approaches can frustrate end-users due to the inconvenience of managing multiple apps or one big app. Although WebXR is emerging as a potential solution with its on-demand web app approach, it faces challenges in interoperability across browsers, limited functionality support, and latency issues. These factors contribute to slower adoption and effectiveness in delivering immersive learning experiences.

Types & Mixture of Content Types & Modalities

Existing Problem: As alluded to in my original article, there are multiple modes of 3D visual rendering including 3 DoF live action video & CG generated 6 DoF experiences. While certain trainings such as frontline productivity require a high level of interactivity such as pallet stacking wherein learning how to manipulate objects in a 3D space is crucial to the job, specific simulations / experiences require higher levels of consistency, linearity of training and may warrant a pre-recorded realistic 3D video that’s enough to do the job such as explaining a store manager on how to implement company’s value in a grocery store while serving customers. In such scenarios, realism of environments, characters and animations is key to information retention. Most trainings that exist today are largely CG 6 DoF simulations (post release of 6 DoF headsets such as Quest 2, Pico Neo 3 etc) which are either an overkill for the use case at hand, or under represent the realism of the environment, which thus reduces their overall effectiveness. Moreover, beyond just 3D, components such as spatial audio driven narration, guidance and re-iteration is an angle of “immersive simulation” that is largely missing from a lot of these trainings. Lack of an understanding of business goals and working backwards from those to define the right content type (Live action vs CG 6 DoF), as well as lack of appropriate narration audio tends to make experiences less effective.

Developer Painpoint: While developers might be capable of embedding high fidelity environments, characters & experiences, and even audio narrations using existing toolkits and asset stores, they lack the understanding of what type of content as well as modality of narration is best suited for the specific experience. Additionally, they might now be adept with methods for 360 video production and editing which is why most experiences that exist today are largely CG. We also generally observe for missing or slower feedback loops between developers and customers, to fine tune the simulations for achieving business goals.

Identity Management

Existing Problem: Most enterprise applications use employee organizational accounts, typically managed in Active Directory, for identification. Conventional training applications like Cornerstone and Workday employ Single Sign-On (SSO) methods facilitated by tools like Okta, PingFed, and Azure AD. However, many enterprise VR applications either lack SSO functionality or lack a standardized implementation method. Manual entry of email IDs and passwords, especially using VR controllers on QWERTY keypads, leads to user frustration and increased login failures.

Developer Painpoint: Developers have varying approaches to authentication in VR applications. Some use login codes generated through companion mobile or web apps, while others combine email and login codes. The absence of a unified SSO layer in VR applications creates user identification and security challenges. Collecting Personally Identifiable Information (PII) like emails and passwords necessitates compliance with various regulations, which developers may not always fulfill. Additionally, integrating SSO systems often requires a mid-tier authentication service to establish a connection between the VR device and the enterprise's active directory, complicating the process.

Bridging the demand and supply gap and developer pain points

As stipulated above, while there is a ton of “Off-the-shelf” immersive learning content out there which may or may not have the right level of quality to accurately meet the demand from the field, there still seems to be gaps in terms of what the market needs and what developers are building. This can be characterized from the following angles:

Demand - Supply Mapping: There's a mismatch between market needs and available immersive learning content. The focus should be on creating simple, consistent, and linear training experiences that address the actual pain points and requirements of the immersive learning market. Aligning demand with supply involves effective demand generation strategies and incentives for developers to create experiences that meet these specific needs.
Monetization Model: The common 30% revenue share in marketplaces poses a significant challenge for developers in realizing value from their immersive learning content without having the right toolkits to customize and continue to enhance it to meet evolving customer needs. A more favorable monetization model could encourage the development of higher-quality content.
Developer - Customer Engagement: The capability to eventually directly engage with end developers to deliver feedback on off-the-shelf experiences and articulating requirements on what needs to be customized to reflects their specific organizational standards & procedures can eventually create long term value and customized content libraries for such organizations.
Developer Tools: Providing developers with comprehensive toolkits can simplify the creation, customization, distribution and analytics of immersive experiences. These tools should address both software and hardware aspects, enabling developers to easily adapt content to specific customer requirements.

How can developer tools help bridge the gap?

Default Passthrough: Enabling developers to use passthrough with spatial UI anchors as the default acclimation view upon entering an app enables users to view the environment, one’s body and hands upon wearing the headset, which creates a higher level of comfort for users, that facilitates them to easily pick up the controllers, adjust headset fit, set safe boundary etc. The better the mixed reality passthrough, the more welcoming the experience would be. The Meta Quest 3 for example, now defaults to passthrough mode and not immersive mode which improves comfort upon first wearing the headset.

Standard Acclimations: Providing developers with standard UI elements / toolkits that they can embed within experiences for users to acclimate to the headset and controllers, set headset parameters creates consistency across experiences beyond immersive learning. A few of these walkthroughs include tweaking headset fit, inter-pupillary distance (IPD), controller calibration etc. Leveraging as many ghost animations, spatial anchors for annotating controllers, quick tips and examples create a very simple and guided acclimation experience.
Simplifying Controller Interactions: Considering the varied ways developers map controller buttons to operations in VR environments, there's a need to simplify these interactions. The goal is to maintain the essence of the learning concept while making the experience more user-friendly. In specific applications like a pallet stacking simulation, it's more important for learners to understand the placement of objects (primary task) rather than the mechanics of grabbing (secondary task). Simplifying this process can enhance the learning experience. A practical solution is to abstract complex point-and-grab actions into a more intuitive drag-and-drop mechanism, utilizing a single trigger button. This approach mirrors familiar real-world interactions, like using a computer mouse, making it more accessible for learners. Creating such simpler abstractions for developers to use out-of-the-box can largely reduce friction.
Hand Tracking & Advanced Gestures: The Vision Pro’s gaze and pinch gesture has opened up a completely new paradigm of human-machine interaction wherein one doesn’t need a controller & a beam to point & click but can rather interact with objects by simply looking at them and pinching. In context of enterprise learning, this makes navigation a lot more easier and accurate and such advanced hand tracking is especially helpful for object interactions and manipulations for experiences that require such interactions. Moreover, the capability to seamlessly switch between controllers and hand tracking also creates ease of use and flexibility between simple and more complex experiences which is also now available on the Meta Quest 3.
Automated Walkthroughs / Quick Tips: Similar to how web application walkthroughs / guided toors exist in the 2D web era, walking a learner through complex interactions within the 3D experience through ghost animations or playback video such as picking a box & placing it in another spot, turning a knob or transposing to a specific location within the 3D environment creates clarity. While this may sound very trivial, superimposing this in an experience requires custom development for every single action. The capability to expose tools that auto-create & standardize such scenarios based on the specific interaction within custom experiences is beneficial for developers to quickly embed such walkthroughs w/o needing to design them for every action.

Embedded Authoring & Script Creation Tools: The capability to provide developers with creative tools that enable them to design immersive learning experiences guided by script designs that follow industry wide instructional design frameworks largely eliminates the need for them to hire explicit instructional design resources. Given the latest advancements in generative AI, such authoring co-pilots powered by LLMs, either specifically trained on such frameworks or fed with context specific to such frameworks can assist developers with defining learning friendly scripts
Standardized Analytics Pipelines: Equipping developers with standard data analytics pipelines & models that are able to harness head, hand & click data to build user engagement heat maps at stipulated time intervals, standard voice analytics models to analyze sentiment, tonality, speech rate, conduct summarization etc. out-of-the-box reduces the friction, effort and resources that developers would have to otherwise spend on building such analytics dashboards. Providing developers with custom events that they can embed within their interactions to track granular interaction data, and providing configurable dash-boarding services that they can either embed within their own apps, or expose to other applications to showcase real effectiveness of such trainings adds a level of depth to how such analytics are presented and consumed by the end user.

Automated Asset Generation and Editing: Given the latest advancements in 3D asset generation with Neural Radiance Fields (NeRF), providing developer with toolkits that can map real customer environments and objects to generate high fidelity assets that are editable to meet customer needs has the potential to accelerate custom content development that requires high fidelity replication of 3D assets within the mixed reality experiences.
Enabling Faster Mixed Reality Development: Mixed Reality development may involve building immersive learning experiences that leverages real life objects as a part of the augmented trainign experience. In such cases, facilitating developers with out-of-the-box computer vision models that enable them to automatically annotate a wide array of objects creates enormous time savings since it can potentially eliminate the need to train customized models for such scenarios. Segment Anything by Meta AI is a great advancement in this area which can possibly accelerate adoption of mixed reality training experiences.
Web App Distribution: Equipping developer with toolkits that expose the capability to author and publish their experiences using protocols such as WebGL to render those experiences in a responsive way and WebXR to publish them as web apps rather than bundled APK (Android Application Package) can make consumption of such trainings much easier, provided that the headsets have adequate browser support for rendering such experiences with high fidelity and low latency.
Types & Mixture of Content Types & Modalities: Enabling and empowering developers with instructional design frameworks that enable them to better understand the right modality for achieving a specific business goal for a customer, is key to facilitating development of more effective experiences. This can be accomplished by standard approaches such as developer discussion forums, webinars. Building developer toolkits that facilitate piecing together both live action and CG experiences, or piecing hybrid experiences (realistic environments with virtual characters), using a unified toolkit creates diversity of modality. Moreover, providing toolkits that merge both visual text & audio based narrations (Using OOTB text-to-speech models and APIs) enables developers to embed such multi-modal interactions within their experiences with minimal effort or custom development, thus making them more effective.
Standardized Identification Templates: Providing standard toolkits to developers that facilitate them to leverage the existing user account on the headsets (such as a Meta account), as the source of user identity eliminates the need for them to explicitly plug-in a login flow within app experiences. Extending such functionalities to generate unique codes within the headset’s companion apps as a means to conduct MFA also makes this method more secure. Lastly, providing the capability to link the headset account to a customer’s active directory / in-house IdP facilitates enterprise single sign on functionality.

Wrapping Up…

Mixed reality has come a long way from tethered headsets powered by desktop compute to fully untethered headsets with better form factors that provide incredible passthrough, advanced hand tracking and smoother software experiences that facilitate easier user adoption. Finding high ROI use cases such as immersive learning both from a consumer and enterprise lens can greatly spearhead mass adoption of head mounted devices. Leveraging developer ecosystems to go-to-market faster with high quality content is key to the growth of the industry. This makes it quintessential to empower developers that are adept at building gam-ified experiences to contextualize their content for the end user persona, as well as the use case. While immersive learning is an example that has been dissected in this case, the power of such marketplaces powered by helpful developer toolkits released in such adjacent vertical markets has historically created inflection points in adoption for new hardware advancements and will likely play a major role for mixed reality devices.

Beyond the Prompt

Discussion about this post