Assessing the real business value of AI agents: Product, UX, Data & Systems

A framework to assess and evaluate the real business value add for AI co-pilots from a product-market fit, user experience, data and systems integrations lens.

Sep 24, 2023

Context

In continuation to my last post on Business Process Automation and the Generative AI Turbocharge, where I break down the concept of enterprise corporate micro-processes, spanning various levels of an enterprise (front, middle & back offices) and various types of systems (engagement, action, reference and insight), this article is an amalgamation of several evaluation frameworks that I’ve used in the past to assess the efficacy of co-pilots focussed on enterprise productivity, and the extent to which they can add business value. While the technology is rapidly evolving from rules based automation tools, to more enhanced AI powered tools that can harness unstructured data, leverage conversational interfaces, support multi-modal interactions, a lot of the business value ultimately come down to how the core problem is being solved. Adoption of enterprise AI-copilots boils down to three things:

Does it make a business user more productive via the end-user experience over a long period of time?
Does it make IT user incrementally march towards a goal of an integrated and a clean enterprise?
Does it make sense from a business case perspective, from a hard and soft dollars lens, for an executive sponsor including the buyer (COO, CRO, CHRO) & approver (CFO) to pilot and expand?

This is the first of a three part series, that focusses specifically on product market fit, synergies, risks, user experience, data and systems integration. A re-usable version of the framework explained below can be found on this Coda page. Here is part 2 of this 3-part series

Key Highlights

Here are some key highlights / a shorter version to this part of the framework, however most of this is discussed in much depth below:

Product Market Fit: AI co-pilots verticalized to specific personas, functions or industries may be more effective due to their increased capability to handle nuances and attain depth of use case. If co-pilots are broader in nature, there have to be some system, process or user level synergies / commonalities that create the capability to increase breadth while also going deep into the use case. Co-pilots have high levels of ROI when approaching simple use cases with high frequency or a long complicated use case split across several steps conducted by several FTEs in collaboration.
Modality: Co-pilots have the capability to be more effective when they accomodate multiple modalities of input & output, specific to the use case at hand. Text / Chat may not always be the best modality by default.
Risks: The efficacy of a co-pilot is also dictated by the financial and regulatory risks associated to the process that is in scope and may largely dictate need for accuracy as well as human-in-the-loop design.
User Experience: Co-pilots are the most effective when they’re guided / step-by-step in their workflow, while being consistent across multiple users. Their capability to augment information, remember from memory and make suggestion driven insights makes them more compelling for enterprise use cases. Finally, editability plays a major role in ensuring that change management is seamless.
Data Types: Diversity of data types (structured vs unstructured) as well as capacity to manage missing / lower quality data is what truly governs efficacy of co-pilots to solve use cases with high levels of accuracy.
Systems: Capability of co-pilots to integrate with a broad depth of systems segregated by type (record, insight, action etc.), function (HR, F&A), market segments (mid-market, enterprise), as well as depth of integration via traditional API based approaches, and non-traditional data scraping approaches contributes to its effectiveness.

Product-Market-Fit, Synergies & Risks

Vertical-ization & Synergies: For AI co-pilots to perform a task in an effective way, they can either target a very specific type of use case vertical-ized by industry (eg. FSI, healthcare), function (eg. HR, F&A), or a can target a specific type of persona (eg. developer, contact center associate) or a system (eg. system of engagement). While an argument can be made that with general purpose LLMs, co-pilots have the capability to maintain depth, but also have the breadth of use cases, building for one specific use case provides long term value since it instills focus and allows the product (or the model) to capture variations and complexities of the industry, persona, function or systems involved. If a co-pilot goes across multiple use cases, this is typically effective when there are synergies across those use cases which can create depth and breadth at the same time.

Re-Engineering the Process: Often times, co-pilots may be looking to solve for processes that are fundamentally broken due to the prevalence of redundancies. For example, a review step may have been conducted twice in the current state, to account for human errors, however, the future state may only need one review step instead of two, since it could have less errors than the manual process. It’s important to assess if the co-pilot is also re-defining the target state process, while optimizing for the benefit that the underlying models bring, rather than simply mimicking the process as-is.

Depth of Process: While typical enterprise operations involve corporate micro-processing, the true value is only realized when a process is automated end to end. The solution can be architected in a way such that there are individual co-pilots for each owner of the micro-process, collaborating and communicating on across each other. For eg. A securities trade settlement co-pilot at an investment bank should be able to operate across the front office (traders), middle office (responsible for processing trade settlements) and back office (responsible for other regulatory reporting), integrating with relevant systems and relaying various information between the desks. The true valued is accrued when each of the desks have sub-co-pilots assisting them to eventually streamline the end to end process, however, to an operating leader, what matters more is cycle time & throughput of processing trades.

Modes of User Inputs: While a lot of co-pilots have largely indexed on chat based conversations as sources of inputs, depending on the type of use cases, the efficacy of the co-pilot depends on the breadth in channels of engagement. A classic example is a support co-pilot wherein customers can engage via phone, text or email. Prioritizing based on the channel of engagement with the highest volume can define how quickly a co-pilot can add business value.

Frequency & Volume: The efficacy of a co-pilot to solve for high frequency and volume use cases with the potential to translate to a high ROI can be bucketed into two categories:

A small, specific but repeatable business process conducted by several FTEs, at a high cadence. The capability of the co-pilot to handle the same linear workflow consistently across multiple FTEs, multiple times a day, creating a highly accurate output with little variance.
A complex process conducted across a high number of FTEs in chunks / smaller steps with multiple hand offs, reviews and high frequency of records. Effective copilots can seamlessly assign and triage these workloads across multiple FTEs in a consistent and organized way.

Regulatory Risks: Specific use cases may have regulatory requirements that may warrant forced manual steps such as a human signature on a document prior to submission. In such cases, it is important to assess how co-pilots address and work around such regulatory risks, and establish controls.

Financial Risks: Some co-pilots may be working in functions such as FP&A, wherein an inaccuracy of the processed transaction output may create a potential financial loss, thus adversely affecting the enterprise’s financial situation. In such cases, it is important to understand whether the co-pilot addresses such financial risks via mechanisms that minimize them. Moreover, understanding whether the co-pilot is consistently churning outputs with minimal variance, and ensuring that the error rate is very minimal is crucial for minimizing financial losses.

User Experience

Guided / Step-by-Step Setup: Effective co-pilots are able to breakdown a specific process into multiple steps of human-in-the-loop which allow for continuous and refined context from the user to drive the co-pilot towards a more accurate answer that’s attuned to the exact ask from the user. This may also allow for seamless step-by-step augmentation of information from source systems about the question in scope to minimize redundancy in gathering user input and induce effective reasoning prior to coming to an answer

Workflow Consistency and Collaboration: A co-pilot designed for a single process that is repeatably done by multiple team members needs to be designed to ensure linearity of user experience. On the other hand, a co-pilot designed for a process split between multiple members of a team needs to be designed to seamless intake, route and triage work items as per the stipulated process flow. This requires seamless back and forth of work and the corresponding output between processors and reviewers. This is crucial for ensuring efficient collaboration and ensuring accountability of work items. Most service lines ideally set standards for consistency, governed by SOPs for a task to be done. Thus, having high levels of consistency and controls on the probabilistic output of a co-pilot is crucial to ensuring that the team is operating within a finite margin of error.

Memory, Suggestion & Augmentation: Language models are stateless which implies that they may not remember context from past conversations. Thus, for enterprise applications of co-pilots, it’s important to augment a workflow with historical memory which can include past transactions and corresponding inputs / outputs from those transactions. Moreover, augmenting information from source systems is crucial to make the system as stateful as possible. Finally, making suggestions based on past information can add tremendous value to proactively identity any outliers throughout the operation. For eg. A co-pilot identifying a higher than usual payable amount on an invoice for a specific vendor, and triaging that to the user instead of auto processing adds a certain level of control to the AP process.

Edit-ability: Business processes are often bound to changes, which thus requires the business to edit the co-pilot workflow. Thus, providing the right product capability to make such edits is crucial for ensuring that change management is cost efficient and easy to execute on. The business vertical leveraging the co-pilot may have to edit the input / output data format / transformations, integrated source systems, order of steps, positioning of human review step, regulatory changes, configuration / thresholds etc. Edit-ability creates ease of implementation and change management, which I will cover in part two of this series.

Data Type & Quality

Diversity of Data Types: Business processes may have varied complexity of data inputs ranging from fixed values, key-value pairs or tables, to more complicated unstructured data types embedded within small to huge documents as free form texts, complex tables or highly variable key value pairs. While language models are powerful at extracting unstructured data, it’s important to understand the percentage of data points extracted and the corresponding accuracy to dictate the ROI.

Not so clean or missing data: Regardless of the co-pilots capability to extract structured or unstructured data, specific verticals / use cases have low quality data, which may involve inaccurate / incorrect data or completely missing data. For eg. A co-pilot working on automating online orders may have customer contact information that can be outdated / have missing values which can then create downstream implications on the capability to fulfill customer orders. In such cases, it is important to understand a co-pilot’s capability to handle such data quality issues and building controls to either extrapolate missing data, reference with source systems or initiate triaging workflows to human-in-the-loop interfaces complete the missing data.

System Integrations

Breadth by system Type: Enterprises leverage system of record, engagement, insight, reference and action. Each of these systems have different ways in which one can transact with the data stored within them. The efficacy of AI co-pilots depends on how the information from such diverse types of systems in scope can be retrieved efficiently across various parts of a user’s workflow to create a stateful and augmented conversation. An organization typically builds several silos across these system types and a bigger breadth creates the ability to bridge such silos. For eg. an operations team may have an important mapping pinned in a Slack thread instead of a central database which creates the need to integrate deeply with Slack.

Breadth by business function: Business processes within a specific function can be spread across a wide variety of systems such as CRMs, ERPs, HR & Payroll systems. For eg. To automate critical tasks such as a customer refund, it’s important to understand the nature of the customer via the CRM, but it is also important to understand whether the team running the process has the right privileges to issue refunds, which may from an HR system.. Each of these system types have different schemas and integration capabilities and the richness of augmenting context across such system from multiple functions is crucial for responsible decision making.

Depth of integration: Such systems can also have varying level of integration capabilities, including REST / SOAP APIs, or no integrations at all. The efficacy of a co-pilot is governed by the breadth and the depth of such integrations to systems specific to the use case(s) in scope. The trickiest integrations are the ones where legacy systems have no API capabilities which may involve screen scraping or interfacing with mid tier data warehouses. The more regulated and big the industry becomes, the less integration friendly can the systems in scope be.

Capability to augment insights and generate intelligence: Beyond what’s stipulated in the user workflow, the co-pilot may integrate with additional systems of insights such as databases to store historical records from transactions, to generate unique insights over the course of the usage of the product. For eg. A sales co-pilot summarizing meetings across a customer lifecycle also has the capability to augment insights across customers regarding common pain points as it pertains to product usage and product features requested across the customer base which then becomes a massively helpful data point for product development teams.

Addressing fragmentation across market segments: Certain co-pilots may operate in verticals / use cases that have a highly fragmented source tools used by the customer across market segments. For example, HR tools may largely vary from Gusto in SMBs to Workday & ADP in enterprise customers. Depending on the target market for the co-pilot, the fragmentation of system creates a requirement to build a large amount of integrations, thus adding complexity. Effective co-pilots are the ones who are able to address such fragmentation, if it exists and prioritize the right set of systems for their target market segment.

Conclusion & Looking Forward

The above can be leveraged as a great starting point for potential customers, investors, or product companies developing the co-pilot to deeply understand its product market fit, and rate the planned design based on the above focus areas by answering the set of questions in the Coda document. In part 2, I focus on areas such as implementation, security, compliance, customization and ecosystems which will further strengthen the assessment from a total cost of ownership perspective. In my final article, I will purely focus on the business case lens including hard and soft dollar metrics, which are majorly responsible for governing buying decisions for such enterprise co-pilots.

Beyond the Prompt

Discussion about this post