The Craft of Spatial Computing & Mixed Reality for the Enterprise

The what, why, how of the enterprise spatial computing & mixed reality universe

Jul 04, 2023

Overview

Mixed reality is a blend of physical and digital worlds, unlocking natural and intuitive 3D human, computer, and environmental interactions. It’s often signified by two key concepts: Virtual Reality & Augmented Reality.

In this article, I dive deep into the craft of Mixed Reality, which is a sub-section of spatial computing and its application to the enterprise via key use cases that broadly apply across various industries. I talk about the different layers of the value chain from infrastructure & hardware to software & tools, and delve deep into the fundamentals of the content ecosystem, highlighting what it takes to implement spatial computing in the enterprise.

Differentiating pillars of spatial computing

Spatial computing integrates digital information that has been brought about and virtual objects into the real-world physical environment, creating a mixed or blended reality experience:

Perception via Spatial Awareness: While enterprises have been successfully able to augment digital information via existing compute methods, spatial computing brings tremendous value in jobs that require the use of spatial awareness & cognition in a 3D environment. This is especially important for blue collar jobs where an employee uses a wider field of view to interacts with real objects & move in a three dimensional space to accomplish a job. This is why immersive & guided learning are proven to create high levels of information retention as compared to 2D learning.
Hearing via Spatial Audio: This is one of the more underrated or unobserved aspects of mixed reality. Spatial audio allows for accurate sound placement and localization within the virtual environment. It enables users to perceive sounds coming from specific directions, just as they would in the real world. This helps create a more immersive and convincing mixed reality experience.

Interacting via Hands Free, Multi-modal Interactions: Spatial computing enables users to interact with digital content and systems in a hands-free manner using gestures, voice commands, or gaze-based input. This hands-free capability is particularly valuable in enterprise environments where users need to access information or perform tasks while keeping their hands free for physical work (Eg. Engine assembly). Traditional computing devices require manual input methods like keyboard, mouse, or touch, which can be limiting and less efficient in certain enterprise scenarios for transmitting information from your brain to the machine.
Contextual Information Augmentation: Allows users to overlay contextual information onto their real-world view, enhancing situational awareness and decision-making. Users can access relevant data, instructions, or annotations directly in their field of view, without the need to switch between different applications or screens. Traditional computing devices typically require users to refer to separate displays or switch between applications, which can disrupt workflow and reduce efficiency.
Real Time Collaboration & Remote Assistance: Users can share a virtual space and interact with digital content simultaneously, regardless of their physical location. This capability is beneficial for remote teams, field technicians, or experts providing guidance from a distance. Traditional desktops, PCs, and tablets, although capable of video conferencing and remote collaboration, lack the level of immersive presence and shared spatial context offered by mixed reality methods.

Drawing parallels to previous waves of computing

Looking back in history, the first iterations of computers were adopted by the government, research institutions and universities in the 1950-60s. The mainframe era from the 1960s - 70s brought ground breaking support for centralized computing for key functions such as accounting, billing & transaction processing. The minicomputer era of the 70s-80s brought computing resources for local data processing & analysis to engineering, sales & inventory management functions. The PC revolution from the 80s - 90s brought by IBM and Apple brought mass adoption via affordable and user friendly graphical user interfaces (GUI). The key value driver for the PCs was centered around improved productivity via advanced compute capabilities and GUI based applications; enhanced data management via the advent of memory management & eventually relational databases; increased collaboration via email & instant messaging and cost savings via the medium of sharing information digitally than physically.

Mixed reality (MR) is rapidly gaining traction in enterprise environments, following a trajectory similar to previous advancements in computing technologies. Initially embraced by pioneering organizations and research institutions, MR is now poised for broader adoption in the enterprise via the use cases mentioned below.

Enterprise Use Cases for Mixed Reality

Virtual reality is a broader concept, which has prevailed in the enterprise for several years in the form of “flight simulators”, however the wave of head mounted displays (HMDs) has played a pivotal role in accelerating adoption of MR in the enterprise. The release of Oculus Rift DK1 in 2013 created a major inflection point in creating awareness regarding mixed reality and its application to the enterprise. Here are some of the high ROI use cases:

Immersive & Guided Learning Learning: MR is used to create realistic training and simulation scenarios where employees practice complex procedures, operate machinery, or perform tasks in a safe and controlled virtual setting without the need for physical equipment or risking any real-world consequences. This create enormous value creation for general operational effectiveness, health & safety, soft skills and customer service use cases across several industries such as manufacturing, construction, healthcare, retail & logistics
General Remote Collaboration: MR enables geographically dispersed teams to collaborate in a virtual space. Participants can interact with 3D models, share data, and communicate as if they were in the same location. This enhances remote collaboration, accelerates decision-making processes, and improves productivity, especially for industries with blue collar workers, where a three dimensional space / object needs collaboration between multiple personas.
Design and Prototyping: MR provides a powerful tool for designing and prototyping products. Designers can create virtual prototypes and examine them in a 3D space, enabling them to assess scale and functionality. This speeds up the design iteration process and reduces costs associated with physical prototypes. This is especially helpful for design & development within the manufacturing, construction and transportation industries.
Sales and Marketing: Mixed reality can be used to showcase products or services in an immersive and interactive manner via virtual showrooms or experiences that allow customers to visualize products, customize features, and make informed purchasing decisions.
Walmart was one of the early adopters of immersive learning via virtual reality and created several experiences for employee training within stores

The Enterprise Mixed Reality Universe

The enterprise mixed reality universe can be largely split into two key buckets:

Infrastructure Layer

The infrastructure universe is the plumbing or picks & shovels layer which constitutes of players all the way from the chipset, OS, hardware & wearables layer to the development engine & enterprise mobility management layer upon which the solutions are built and managed, and finally to the systems integration services & network providers layer which delivers the connected hardware & software experiences to end customers.

Chipset Layer

The compute layer as of today is concentrated by Qualcomm, via its XR2 chip, which most headset manufacturers build upon. New entrants include the AMD LiquidVR and the recently launched Apple R1 chip which is currently exclusive to the VisionPro.

Operating System

It’s interesting to see that most headsets that have their own operating systems that are eventually forked off of Android Open Source Project (AOSP). The version ranges from Android 8 to 12, but mostly wrapped in a device manufacturer specific UX layer such as LuminOS by Magic Leap, Pico OS & even the Meta Quest to an extent. The non-Android based OS includes the Windows Holographic Operating System dedicated to the Hololens and the VisionOS, dedicated to the Apple Vision Pro

Enterprise Headset Manufacturers

While most of the headset sales in and outside of the United States are bent towards the Quest 2 & the Quest Pro bought by gamers, fitness fanatics and early adopters focussing on apps such as Roblox & Beat Sabre, a small but growing proportion of headsets being crafted for the enterprise.

Device manufacturers such as HTC and Lenovo are investing capital in building enterprise grade MR headsets such as Focus 3, XR Elite & ThinkReality VRX.
Bespoke manufacturers such as Varjo engage in building expensive high-end headsets for use cases with low latency & high compute requirements.
Consumer focussed companies such Meta & Pico have a considerable share of their business invested in selling enterprise ready headsets. The Pico Neo 3, Pico 4E, Pico G3 have a footprint at various Fortune 500 enterprises.
Finally, there are category creators for dedicated augmented reality headsets including Microsoft’s Hololens, Magic Leap 2, and very recently the fantastic Apple Vision Pro.

Multi-Modal Wearables

Specific companies focus on elevating multi-modal interactions with mixed reality applications via wearables such as haptic gloves and dedicated hand tracking equipment. Ultraleap offers an external device for hand tracking, while HaptX provides haptic gloves that can track hand interactions, as well as provide haptic feedback based on the nature of the interaction. These are especially helpful for training scenarios which involve hand interactions and decision making based on the corresponding haptic feedback. For example, a gearbox assembly process involves assembly of minute components and heavy gears, which cannot be realistically simulated using controllers or native hand tracking.

Game Development Engines

While Unity and Unreal are the two most common game development engines used across enterprise and consumer use cases, Unity for Enterprise currently has the largest market share of the application via a high number of developers that build enterprise content & applications on Unity. Almost all the headset manufacturers support Unity and even the VisionPro, despite the existing ARKit capability, is looking to provide an integration to applications built in Unity.

Enterprise Mobility Management

The core infrastructure software providers primarily include companies that provide enterprise mobility management solutions (EMM) or Unified Endpoint Management Solutions (UEM). In context of VR headsets, this industry is typically comprised of:

Incumbent EMM / UEM providers such as VMWare WorkspaceONE UEM, Microsoft Azure Intune, SOTI Mobicontrol & Ivanti Mobileiron who are bundling functionalities to support Mixed Reality device fleets in a similar framework to how they support laptops, mobile phones, tablets & rugged devices.
Bespoke MDM providers such as ManageXR & ArborXR that focus on management of only enterprise XR device devices, thus providing feature sets that are unique to XR devices.

Network Infrastructure Providers

While this may sound trivial and broad, network infrastructure providers have a crucial role to play in making the runtime of MR applications, as low latency, high bandwidth and thus, as seamless as possible. MR devices can essentially be considered as IoT style devices with high levels of sensory inputs & outputs thus requiring a strong internet connectivity, especially at remote business locations. Incumbents like Verizon & AT&T have rolled out 5G powered IoT networks & devices (such as MiFis) to deploy mixed reality applications at scale.

Systems Integrators & Service Providers: The key partners bringing all of these different solutions to life and integrating them with a customer’s existing business systems to paint an end to end enterprise mixed reality are the big systems integration and management consulting companies. The most prominent examples include Accenture, Deloitte, and several more big & small service providers that touch several facets of enterprise adoption for mixed reality. The last but not the least are the core systems integrators such as Insight, SHI, CDW etc. that provide the “last mile” integration services with respect to hardware procurement, provisioning and deployment for devices to their existing and new enterprise customers.

User Experience Layer

The user experience layer sits on top of the infrastructure stack and brings most of the direct value to enterprises in the form of applications, platforms and content that solve for specific problems or use cases. Here is a representative infographic of this space all the way from game studios & vertical content creators to software platforms that bring end to end value from a core orchestration, management, content & integrations perspective:

Content Creators

Content creation is the most crucial and the most evolving, yet always insufficient piece of the mixed reality industry. Content is king for creating engagement and adoption. Delivering content to the enterprise is typically done via a services based model and the players can be split into the following categories:

General Content Shops that create content across a wide variety of enterprise use cases. A few examples include StartBeyond, Sweetrush and 3lbxr.
Vertical Content Shops that typically focus on learning & development for various enterprise use cases such as ELB Learning, Roundtable Learning.
Big SIs & Consulting companies such as HCL, Accenture, Deloitte etc. that engage in large scale content creation services for their existing customer base. These practices however, are quite small and nuanced at this point but helpful for such companies to increase their account footprint via such bespoke services.

Specialized Software Platforms & Tools

While the total MAU for VR headsets is still nascent enough for true network effects to kick in, especially for the enterprises, we’re still seeing some early enterprise platforms that are often focussed on specific use cases:

Enterprise Immersive Learning Platforms engage in enabling the creation, deployment and management of immersive learning at scale. They key players are STRIVR, Transfr, PixoVR, OssoVR and SkillsVR.

AR based productivity or work management tools, targeted at specific use cases such as Taqtile, ScopeAR, Gamma-AR (for construction) and Immersed (for productivity) that leverage existing augmented reality features within headsets to achieve guided learning use cases for construction & manufacturing industries.

MR Soft Skills platform specifically focus on employee soft skills development across various use cases such as customer interactions, de-escalation, sales & support training. Key players include Ovation, VirtualSpeech, BodySwaps, Talespin that leverage the power of an immersive corporate settings and high fidelity characters to simulate scenarios that may otherwise be hard to simulate in a real life environment.
Prototyping & Design platforms such as ShapesXR, Masterpiece Studio, Gravity Sketch are natively designed to support 3D design and prototyping in a collaborative setting which is especially helpful for physical design & development use cases in hardware, automotive, energy & construction industries.
Healthcare & Digital Therapeutics applications that are catered towards specific surgical procedures, as well as general health & wellness use cases. Companies such as OssoVR & Oxford VR engage in various healthcare training scenarios while companies such as Tripp & Reulay engage in building meditative experiences for both enterprises & consumers.
Virtual Collaboration tools such as Horizon Workrooms, MeetinVR, Glue VR either create their own suite of virtual collaboration tools, or integrate with existing tools such as Zoom. Such tools bring virtual meetings in 3D via digital personas (avatars), environments & object plug-ins such as whiteboards & presentations. Although, Virtual collaboration in 3D is largely an early stage and untested territory from a value creation perspective and may have a long way to go to create true differentiation.

The above list is representative and is the tip of the iceberg of the 100s of companies trying to build out platforms with true network effects and ecosystems that can span across multiple use cases, personas and eventually multiple enterprise industries.

Understanding the Content Ecosystem

Degrees of Freedom

Degrees of Freedom (DoF) refers to the number of independent ways in which a user can move and interact within the virtual environment. It describes the range and complexity of user motion that can be tracked and translated into corresponding actions or interactions in the MR experience. Most mixed reality experiences are typically categorized into 3 DoF & 6 DoF experiences:

Dilmer Valecillos on Twitter: "A simple image explaining 3 DoF versus 6 DoF since I get asked this question a lot...#VR https://t.co/KgKgQIJZHs" / Twitter

3 DoF experiences facilitate rotation around the X-axis (pitch), Y-axis (yaw) & Z-axis (roll), thus enabling the user to tilt their head or change its angle up or down (similar to nodding), turning one’s head left or right (similar to looking from side to side) & tilting the head as if the user is shaking it.
6 DoF experiences facilitate translation along the X, Y & Z axis, that enables the user to move horizontally from side to side, vertically, either ascending or descending and move forward or backward within the virtual environment.
Interaction & Object Manipulation refers to the ability to interact with and manipulate virtual objects within the mixed reality environment. It involves performing actions such as grabbing, moving, rotating, resizing, and otherwise manipulating virtual objects as if they were physical objects in the real world. Users can interact with these objects using various input devices, such as handheld controllers, gesture recognition, or even hand tracking technology.

Types of Content

Here is a quick summary of various types of content that exists as of today:

Fully Immersive - Live Action / 360 / 180 Video: This type of immersive content involves pre-recording real experiences using 360 cameras or several high quality 2D cameras, where the footage is eventually stitched together and rendered as a 360 video. For enterprise use cases, this content type is especially helpful for facility walkthroughs, operational effectiveness & safety use cases which require immersion in a wider field of view. Here’s a video of a 360 factory tour of Tesla’s factory is Fremont, which can be viewed on a virtual reality headset using YoutubeVR. 360 experiences are typically 3 DoF in nature, however, there have been examples of 6 DoF 360 videos.

Fully Immersive Computer Generated (CG) Experiences: Computer-generated experiences refer to immersive and interactive environments that are created using computer-generated graphics, audio, and other sensory inputs. Computer-generated experiences in virtual and mixed reality are created through a combination of 3D modeling, rendering, animation, audio design, and interactive programming. They are designed to provide users with immersive, interactive, and often realistic experiences. Below is an example of a recorded in-headset CG immersive training targeting a specific healthcare use case. Such experiences can be both rendered in 3 or 6 degrees of freedom.

Augmented Experiences: Augmented experiences involves rendering computer generated 3D objects either within an actively rendered real environment, which is typically done in a passthrough mode via MR device cameras, or super-imposing such objects in a pre-recorded 360 experience.

Understanding the MR Infrastructure Plumbing

Adoption of mixed reality infrastructure in the enterprise today seems to follow a similar pattern to how the computer adopted in its first few iterations of the technology. Here are the key characteristics of how MR is adopted within the enterprise and the various considerations that need to be kept in mind, to make it acceptable & conducive to a typical enterprise:

Network Infrastructure: As mentioned above, high bandwidth, low latency and reliable network infrastructure is crucial for mixed reality applications. While corporate office settings may have cracked the code for strong internet connectivity, blue collar locations such as warehouses, distribution centers & remote offices often struggle with poor connectivity due to the presence of several obstacles & lack of vicinity to network hubs.
Running Compute on the Edge: Due to constraints in connectivity at remote business locations, MR applications can work around this constraint by creating locally cached / offline experiences with asynchronous data downloads & uploads to ensure that the products are user ready. However, mixed reality application that require real time information augmentation with very low latency face lots of challenges. Being able to run as much compute on the edge / device, be it high fidelity experiences / object rendering, or information retrieval is crucial for the success of MR applications, especially for the blue collar enterprises.
Shared Check-in / Check-out model: Given that most enterprise MR implementations are limited to one or more of the use cases listed above, most enterprises maintain a few headsets per business locations for employees to check in and check out sequentially for respective use. This currently follows a similar model to how rugged devices are used and thus requires sanitation, operational & storage considerations. For applications such as training, enterprises may employee dedicated areas where such devices can be used.
Limited use per employee: Given the check-in & check-out model, the maximum use for a headset per employee is often limited to 30 mins to an hour at best per experience since that’s how long experiences / remote collaboration sessions are typically designed for. Some mixed reality applications that may be mission critical for manufacturing operations may be worn for longer periods of time.
Plug & Play via Enterprise Mobility Management: Similar to how organizations use rugged devices including tablets, mobile phones which are kiosked to run a specific application such as product catalogue search tools or customer check-in software, MR devices are typically deployed in a plug-and-play fashion wherein, they’re kiosked to the application(s) in scope, auto-connect to the corporate network and facilitate check in & check out functionalities via employee log-ins. Any user restrictions, security policies, software & firmware updates are securely managed via the Mobile Device Management system.
Data Privacy & Security: Most HMDs granularly track spatial data such as head & hand / controller positioning and expose this data to applications via their SDKs. Applications use this spatial data to create interactive experiences and some may choose to gather this data for creating immersive analytics that further add value to their product offerings. Given that most enterprises have to abide by specific employee data protection laws such as CCPA & GDPR, creating a strong governance & privacy framework that provides enterprises, the capability to know and control how and where is data is fetched & sent. Understanding how applications store (if at all), any PII, retain it, encrypt it, delete it and allow opt out to users, is a deal breaker for adoption in heavily regulated industries
User Management: The shared check-in, check-out model and the need to tie a user’s data to their usage session requires the need for a user to identify themselves via their enterprise identity, typically stored in an Identity Access Management (IAM) system such as Okta, PingFed, ADFC etc. Thus, virtual reality applications adopted in the enterprise are typically required to integrate via a Single Sign On workflow to authenticate employees. This is often tricky when a user is prompted to enter a long email address & password in a 2D browser via a QWERTY keyboard, using a point & click controller. Some applications however, have found creative ways to smoothen this experience by offering QR code based authentication.

Summary

Having explained this detailed context, I’d like to summarize and bring your attention the following takeaways:

The ability to achieve spatial awareness via vision & audio, multi-modal interactions via hands or haptic devices and the capability to augment contextualized information are the pillars for value creation within the enterprise
Enterprise Spatial Computing ≠ the social metaverse that largely focusses on social interactions and presence. There are some core components of the metaverse that apply to enterprise spatial computing but the current enterprise spatial computing universe targets to solve specific problems, that otherwise can be solved in a limited way via existing 2D rendering or manual / physical methods.
As it stands today, the content creation ecosystem is sparse and highly fragmented to say the least. This is the biggest challenge that's endemic to the MR industry as a whole, both in the entertainment and industrial spaces. The industry is having trouble staying afloat due to a dearth of content. And it's partly because not many players have figured out how to build it in a profitable & scalable manner.
Network effects and interest graphs are possibly two key drivers for helping the industry cross the chasm for specific use cases verticalized by industry. I will. talk about this in depth in my future articles.
It takes some amount of baseline plumbing / infrastructure, systems, network & privacy policies to implement mixed reality for the enterprise, however the general concepts are largely similar to implementing any other hardware & software solution.

Beyond the Prompt

Discussion about this post