Convai x AES 2025 | The Future of Training is Conversational: Roleplay with AI-Powered Avatars

Convai Team

October 13, 2025

At the Augmented Enterprise Summit (AES 2025), Purnendu Mukherjee, founder and CEO of Convai, presented a compelling vision for the future of training and L&D: make it conversational, embodied, and agentic inside XR. This recap of the talk describes how AI-powered virtual humans are transforming learning and training.

Readers will find: a quick skills-spectrum overview (soft ↔ hard), a step-by-step product walkthrough of Convai’s creation-to-deployment flow, assessment and analytics, multi-modality delivery (text, voice, avatar call, full XR), security & deployment (including on-prem/air-gapped), and a slate of real-world demos across defense, manufacturing, enterprise, and space—featuring partners like Buendea.

Watch the full video of the presentation here:

Why embodied AI for XR training— and why now?

Most AI systems, even cutting-edge LLMs, remain text-first. But humans don’t learn primarily through text. From infancy, we learn through spatial awareness—navigating and acting in a 3D world—then grounding language in that embodied experience.. XR (extended reality) and high-fidelity 3D simulations create the best conditions to give AI a similar embodied self— an agent that can perceive the scene, reason about objects and people, and act appropriately.

For enterprise L&D teams, that shift from “text on screens” to spatially-grounded, conversational practice is what drives measurable training outcomes: better retention, higher engagement, and faster time-to-competency.

Setting some context— the inception and formation of Convai

In 2017, before “embodied AI” became a mainstream term, Purnendu argued that 3D spatial awareness would be the next big wave in AI. His thesis work and an accompanying blog explored how perception and action in space could anchor more human-like intelligence.

He then joined NVIDIA, contributing to large-scale training and inference (including early BERT work) and collaborating with Jensen Huang’s team on high-visibility demos. In 2020, NVIDIA shipped its first 3D avatar chatbot, showcasing real-time dialogue in a virtual environment.

Building on that momentum, Convai was founded to place conversational and agentic AI inside virtual worlds: giving avatars a mind (backstory, personality, memory, knowledge) and the capacity to perceive, converse, and move. An early open-ended ramen shop demo went viral, hinting at what soft-skills role-play in XR could feel like when it’s not scripted but truly live and responsive.

Why training became our #1 enterprise use case

When Convai opened up as a developer platform for spatial computing, organizations tried it across categories—gaming, brand experiences, digital twins—but training quickly dominated. Over the last year, Convai leaned into enterprise L&D and XR training, where role-play with AI-powered avatars and procedural assistance deliver immediate value.

The skills landscape: a spectrum from conversational to hands-on

In L&D, real deployments rarely fit into neat “soft vs hard skills” boxes. Instead, scenarios sit along a spectrum where the role of environment, stress, navigation, and object interaction varies.

1) Pure conversation (soft skills)

What it is: Realistic, unscripted role-plays for conflict management, leadership coaching, performance feedback, customer service, or sales.
Where it runs: Often in 2D (web) or video-call style because the environment matters less than the dialogue and feedback.
Why it works: Learners get active practice—not passive reading/watching—so retention improves.

2) Situational soft skills (context matters)

What it is: Stressful or safety-critical moments—e.g., an equipment failure, a workplace injury, or a crisis—where immersion changes decision-making.
Where it runs: XR adds realism; learners may not manipulate objects, but being there alters how they respond.
Why it works: Builds situational judgment, prioritization, and calm communication under pressure.

3) Navigation-dependent scenarios

What it is: Tours, digital walkthroughs, or product demonstrations where the trainee needs to move through space, explore stations, and sometimes operate a product while pitching.
Where it runs: Typically XR (or pixel-streamed 3D in a browser).
Why it works: Combines spatial memory with verbal fluency—critical for field sales, onboarding, and facility familiarization.

4) Hands-on hard skills

What it is: Operating, troubleshooting, and demonstrating devices with stepwise, procedural guidance and physical interactions.
Where it runs: XR with high-fidelity interactions or 3D sims coupled with a conversational agent.
Why it works: Reduces time on tools, prevents errors, and lets learners practice safely.

Why organizations adopt embodied, agentic training (measurable ROI)

Order-of-magnitude cost reduction: Agentic flows and centralized knowledge turn multi-day hunting into minutes. Check out our pricing page to learn more.
Real-time, personalized feedback: Automatic critique on knowledge, tone, delivery, steps performed, and decision-making—improving each session.
Personalized learning journeys: The agent adapts to goals, history, and proficiency, moving beyond one-size-fits-all courses.
Active learning beats passive content: Conversations, quizzes, “what-if”s, and branching keep learners engaged and boost retention.
Learner agency: Trainees can ask anything, explore deeper, and re-try difficult branches until mastery.

Now that we have a better understanding of what Convai does for organizations, especially in accelerating learning and training efforts, let's take a look at how you can very effectively create your very own AI-powered virtual human with Convai.

Product walkthrough: how teams build, test, and deploy with Convai

1) Build the character’s mind

Use Convai’s studio to define backstory and personality, connect documents & knowledge, choose voice/language and the LLM, and configure memory plus guardrails. The result is a consistent persona that knows your products, policies, and procedures and stays on brand.

2) Choose or import an avatar

Select from a large avatar library or import your custom character (e.g., XR trainer, brand ambassador). Visual identity matters for learner trust and recall.

3) Test fast in a video-call-style interface

Spin up a live video call session to validate knowledge grounding, tone, and scenario flow. Instructional designers can iterate quickly before pushing to 3D/Web/XR.

4) Drop into a 3D scene (browser-based, pixel-streamed)

Run high-fidelity scenes from any device with a modern browser, no local GPUs needed! Trainees walk, look around, and talk to characters that can also navigate and interact—ideal for spatial computing programs and distributed teams.

5) Add actions with XR Animation Capture

Lack of animations can break immersion (“can you pour a drink?”). With XR Animation Capture, trainers record the motion once in-headset, save it, and the AI can re-enact it contextually. Over time, you build a reusable action library.

6) Create tour guides and spatial walkthroughs

Import scanned environments (e.g., via Gaussian Splatting), place waypoints, and link the knowledge bank (manuals, PDFs, slide decks). The agent becomes a location-aware guide, answering contextual questions on the spot.

7) Works with your engine & toolchain

Convai integrates with Unreal Engine, Unity, Three.js, and NVIDIA Omniverse (extension available). It’s highly-rated across their marketplaces, and supports typical DevOps/content workflows enterprise teams expect.

8) One deployment, many modalities

Ship once, deliver everywhere:

Text chat (LLM) for quick lookups and SOPs
Voice assistant for hands-free coaching
Avatar video-call for browser-based role-play
Full 3D/XR for embodied, spatial scenarios

Match interaction style to learning objective

Soft-skills scenarios bias toward role-play: coaching, performance conversations, sales objections, de-escalation.
Hard-skills scenarios bias toward assistance/Q&A: “What’s the next step?”, “Which tool do I use?”, “What are the safety checks?”.
Many enterprise programs blend both (e.g., product demo with live operation + persuasive narrative).

The information above can help you create your very own AI powered virtual human, anytime, anywhere and easily. However, having created your AI character's mind, it is important to understand what type of agent helps your goal- embodied or disembodied agent.

Embodied vs. disembodied agents: where each shines

Disembodied agent inside 3D

A voice-first guide with on-screen prompts in immersive scenes which is great for procedural training, checklists, troubleshooting, and contextual Q&A without a visible avatar.

Embodied agent (avatar)

The most effective format for role-play. Learners practice eye contact, tone, pacing, and empathy with a present character, delivered via a 2D avatar call or fully in 3D/XR.

Assessment, feedback, and analytics that matter

Define custom rubrics aligned to business outcomes. For example, knowledge accuracy, policy compliance, tone, empathy, delivery, procedural correctness, time to completion, and error recovery. Convai analyzes each session and generates individual and cohort-level reports for learners, managers, and instructional designers.

Typical outputs include:

Scores by rubric dimension (with rationales)
Moment-by-moment highlights (great answers, misses, recovery)
Recommended micro-practice to close gaps
Trend lines across sessions, teams, regions

Real-world examples

Many enterprise deployments are private; these examples illustrate the range and fidelity organizations can achieve with XR training and AI-powered avatars.

Military training simulations

The German Federal Armed Forces use a digital twin of Colonel Carl Hartman to issue orders in a war-gaming pipeline. Trainees interact conversationally while the system ties into execution logic; bridging strategy with simulation.

Manufacturing tour guide

A factory tour uses spatial waypoints and connected documentation to answer process questions in context. Great for safety walk-throughs, line changeovers, and new-hire onboarding.

Mock interviews in MR (“Crush the Interview”)
Mixed-reality role-play delivers technical Q&A, follow-ups, and pressure-testing. Trainees practice explaining decisions, handling pushback, and structuring answers.
Safety training
A warehouse supervisor character coaches PPE selection, hazard identification, and stepwise navigation through risk zones that are ideal for OSHA-style programs and logistics operations.
Accenture (San Francisco)

A 3D virtual assistant provides open-ended recommendations and contextual information for visitors, showing how enterprise support can be embedded inside a real environment.

NASA partner (Buendea)

High-fidelity lunar simulation that combines embodied and disembodied agents. The AI gives agentic step guidance (“follow the highlighted path,” “open the power station”), performs multimodal perception (the agent sees what the trainee sees), and answers contextual questions (e.g., rock identification). Scenarios support both AI crew and human multiplayer.

Deployment, security, and enterprise readiness

While cloud is available, Convai specializes in on-premise, air-gapped deployments which is crucial for defense, aerospace, manufacturing, and other sensitive domains. Much of the stack is source-available, enabling enterprise teams to adapt, extend, and integrate Convai into existing pipelines, identity providers, Unreal Engine/Unity content, and NVIDIA Omniverse workflows.