Inworld Products
All Inworld Release Notes
- Nov 20, 2025
- Parsed from source:Nov 20, 2025
- Detected by Releasebot:Dec 23, 2025
Node.js Runtime v0.8.0
Enhanced performance, execution control, and component access for custom nodes.
- 2x faster performance with optimized addon architecture
- Cancel running executions with abort() on GraphOutputStream
- Call LLMs from custom nodes via getLLMInterface() and getEmbedderInterface()
- Build stateful graph loops with DataStreamWithMetadata
Breaking changes
graph.start() is now async, and stopInworldRuntime() is required.
See the Migration Guide for upgrading from v0.6.
Original source Report a problem - Nov 6, 2025
- Parsed from source:Nov 6, 2025
- Detected by Releasebot:Dec 23, 2025
Introducing Timestamp Alignment, WebSockets and More for Inworld TTS
Inworld TTS debuts a major release with speed boosts, multilingual expansion to Russian, API voice cloning, custom voice tags and pronunciation controls, plus WebSocket streaming for low latency and precise timestamping for lipsync.
Performance improvements - now #1 on Artificial Analysis TTS Leaderboard
Speed and quality are critical for real-time voice. Inworld TTS is now faster, smoother, and more natural across production workloads. Inworld TTS 1 Max just ranked #1 on the Artificial Analysis Text to Speech Leaderboard, which benchmarks the leading TTS models on realism and performance.
Quality improvements
New TTS models deliver clearer, more consistent, and more human-like speech.- Clearer articulation: Lower word error rate (WER) and better intelligibility on long or complex sentences.
- Improved voice cloning: Higher speaker-similarity scores; voices retain tone, pacing, and emotion even across languages.
- More accurate multilingual output: Fewer accent mismatches and more natural pronunciation across supported languages.
Latency improvements
We’ve reduced latency across multiple layers of our stack:- Infrastructure migration: New server placements cut internal round-trip time by ~50 ms, especially benefiting users in the US and Europe.
- Optional text normalization: Disable text normalization in the API to save 30–40 ms for English (up to 300 ms on complex text) and up to 1 sec in other languages.
- WebSocket streaming: Persistent connections reduce handshakes, enabling faster starts and smoother real-time dialogue.
- Faster inference: Inworld TTS Max now runs on an optimized hardware stack, enabling responses that are ~15% faster.
WebSocket support
For real-time conversational applications, our new WebSocket API offers persistent connections with comprehensive streaming controls.
HTTP requests work fine for simple TTS, but they add overhead when you're building voice agents, interactive characters, or phone call agents, as each request requires connection setup.
WebSockets keep a persistent connection open. You can stream text as it arrives from your LLM, maintain conversation context, and handle interruptions gracefully.
Three ways WebSockets give you more control:- Context management: Run multiple independent audio streams over a single connection. Each context maintains its own voice settings, prosody, and buffer state.
- Smart buffering: Configure when synthesis begins with maxBufferDelayMs and bufferCharThreshold. Start generating audio before complete text arrives, or wait for full sentences.
- Dynamic control: Update voice parameters mid-stream, flush contexts manually, or handle user interruptions without dropping the connection.
Perfect for: - Interactive voice agents that require low latency
- Dynamic conversations where barge-in or interruption support is needed
Timestamp alignment: Sync audio with visuals & actions
Building lipsync for 3D avatars? Highlighting words as they're spoken? Triggering game play actions at specific moments in speech? Handling barge-in and interruptions? You need timestamps.
Timestamp alignment returns precise timing information that matches your generated audio. Choose the granularity that fits your use case:
Use word-level timestamps for:- Karaoke-style caption highlighting
- Triggering character actions when specific words play
- Tracking where users interrupt the AI
- Syncing UI elements with speech
Character-level timestamps are most common for lipsync animation, where they can be converted to phonemes and visemes.
Timestamps currently support English for both streaming and non-streaming, with other languages experimental.
Voice cloning API for programmatic voice creation
Voice cloning is no longer limited to our UI. Now you can create custom voices directly through the API. Available in beta to select customers.
Why this matters:
If you're building a platform where end users need to clone their own voices, you can now integrate that experience directly into your app, without redirecting users to Inworld's interface. You can also create voices in bulk using a simple script.
Use cases:- Games where players create their own character voices
- Social platforms where users create their own avatars
- Games or call centers where a large number of voices need to be created in bulk from pre-recorded audio samples
Voice cloning APIs enable third-party platforms to offer voice creation as a native feature in their own workflows or create voices in bulk.
Custom voice tags
When creating a custom voice in the UI or API, we now allow users to apply tags to their voices for grouping and filtering.
Why this matters:
You can now easily manage a large database of voices and filter for the appropriate voice at runtime, which is highly valuable in games and related applications, where characters are often generated on the fly.
Use cases:- Gaming platforms where characters are generated on the fly and need to be matched to an appropriate voice
- Enterprise apps where the optimal voice is chosen at runtime based on the user profile
- Applications that are still in development, where managing and iterating on a large number of voices is an essential workflow in the design process
Voice tags are the first step toward a larger voice library and management system.
Custom pronunciation: Say it your way
Getting AI voices to pronounce words correctly matters. Brand names, character names, technical terms, and regional dialects are often misspoken by standard TTS models because they aren't represented well in the training data.
We now allow users to manually insert phonetic notation into their text, allowing for consistent and accurate pronunciation of key words. Not sure what phonemes to use? Ask ChatGPT or your favorite AI assistant for the IPA transcription, or check reference sites like IPA Pronunciation Guide | Vocabulary.com
Common use cases:- Brand names that need to sound perfect every time
- Unique names
- Medical, legal, or technical terminology
- Regional pronunciation variations
- Fictional locations and proper nouns
We support International Phonetic Alphabet (IPA) notation.
Russian support and multilingual improvements
Inworld TTS now speaks Russian, bringing our total to 12 supported languages. All supported languages include English, Spanish, French, German, Italian, Portuguese, Chinese, Japanese, Korean, Dutch, Polish, and Russian.
Clone a voice and label it as Russian, or choose one of our pre-built Russian voices. As with all languages, voices perform best when synthesizing text in their native language, though cross-language synthesis is possible.
We've also made quality improvements across all non-English languages. Better pronunciation accuracy, more natural intonation, and smoother speech patterns.
For multilingual applications, Inworld TTS Max delivers the strongest results with superior pronunciation and more contextually-aware speech across languages.Try these features today
All features are available now through our API and TTS Playground, at the same accessible pricing.
Get started:- Try TTS Playground
- Read the docs
- You can also access Inworld voices and text-to-speech models via LiveKit, NLX, Pipecat, and Vapi.
Frequently asked questions
How do I convert timestamps to visemes for lipsync?
The typical pipeline: character timestamps → phonemes (using tools like PocketSphinx) → visemes (using your game engine's mapping). Our timestamps provide the timing foundation.How do I gracefully handle interruptions with websocket?
The WebSocket endpoint supports multiple independent contexts, enabling seamless barge-in handling. When a user interrupts, you can start a new, independent context and send the post-interruption agent response to it. The old context can be closed when the interruption occurs.What are some techniques to optimize end-to-end latency?
Original source Report a problem
To reduce latency, consider using the TTS streaming API, keeping a persistent WebSocket connection, and disabling text normalization by instructing your LLM to create speech-ready text via a system prompt. - Nov 4, 2025
- Parsed from source:Nov 4, 2025
- Detected by Releasebot:Dec 23, 2025
The 3 Engineering Challenges of Realtime Conversational AI
Inworld launches Runtime, a low latency AI backend for real time conversational AI. Build with the SDK, deploy hosted endpoints, and run live A/B experiments with automatic traces to cut development time and improve user experience.
The Vision
Every builder in conversational AI shares a common goal: to create systems that feel natural, responsive, and personalized. But in practice, we spend more time wiring APIs, debugging, and optimizing latency rather than optimizing user experience.
Inworld Reflection
We started at Inworld by building lifelike AI characters that gamers loved—ones that could remember, converse naturally, and feel real.
As our customer base expanded beyond games, they asked for complex customizations—to plug in their own models, connect to proprietary data, define custom emotions, routing, and more.
With each request, our engineering teams spent less time on shipping user features and more time writing integrations and debugging.
This realization led us to a critical analysis of where our development time was truly going.That analysis revealed three recurring engineering pain points in building realtime conversational AI, and we built Inworld Runtime to solve them.Inworld Runtime
Inworld Runtime is a low-latency AI backend for realtime conversational AI. You build your conversational AI with Inworld Runtime SDK, launch a hosted endpoint using Inworld CLI, and observe and optimize your conversational AI by running A/B experiments in the Inworld Portal.
The 3 Challenges of Realtime Conversational AI
Problem 1: Latency that breaks the realtime feel
Before: High latency under high loads
- Scaling issues: As apps scaled to thousands of users, latency spiked above one second.
- Blocking operations: Many popular programming languages, while excellent for rapid prototyping, have runtime limitations that prevent true parallel execution, leading to blocked operations when we need to run multiple LLM calls, embeddings, and processing tasks concurrently.
With Runtime: True parallel execution at the C++ core - Parallel execution: Using Runtime, an agent can embed user input, retrieve knowledge, and do a web search all at once, then proceed to the LLM call, dramatically reducing end-to-end latency.
- Pre-optimized backend: The graph executor automatically identifies nodes without dependencies and schedules them in parallel — no manual threading code required.
- Read Streamlabs case study: Built a realtime multimodal streaming assistant with sub-500 millisecond latency
Example Node.js LLM -> TTS pipeline - Low Latency with C++ optimized backend
Problem 2: 50% dev time spent in integration and debugging
Before: Repetitive, time-consuming tasks
- Wrote repetitive integration code: For every new feature that required integrating an AI model, we found ourselves writing similar integration code.
- Reconstructed execution paths by hand: When an agent's behavior was incorrect, our primary tool for analysis was traditional logging. We had to sift through disconnected logs from various parts of the codebase to manually reconstruct the sequence of events.
- Coupled orchestration and business logic: The control flow for handling model responses, error retries, and feature-specific logic—like updating how fallback responses were triggered—was deeply embedded within the business logic, making even minor feature updates risky. Bringing new developers up to speed took weeks instead of days.
With Runtime: Less Maintenance, More Iteration - Build fast with pre-optimized nodes: Developers get a full suite of nodes to construct realtime AI pipelines that can scale to millions of users, including nodes for model I/O (STT, LLM, TTS), data engineering (prompt building, chunking), flow logic (keyword matching, safety), and external tool calls (MCP integrations).
- View end-to-end traces and logs automatically: Instead of reconstructing the execution path manually, developers simply go to Inworld Portal to view the end-to-end trace and logs. Every node execution is automatically instrumented with OpenTelemetry spans capturing the node, inputs, outputs, duration, and success/failure.
- Write modular, easy-to-understand code: Developers define each node’s inputs, outputs, and dependencies in a graph, making the execution path explicit and visible — you can see exactly which nodes connect to which others, making onboarding new team members easy. They can contribute to a single node on day one, then gradually understand the broader graph structure.
- Read Wishroll Status Case Study: Went from prototype to production in 19 days with a 20x cost reduction
Automatic Traces for Each Graph Execution
Problem 3: Slow iteration speed
Before: Customization incurred technical debt
- Bespoke customization: As our customer base grew, so did the need for customization, so the code became brittle and hard to reason about.
- If/else hell: Different clients required slightly different logic, tools, or model choices. In our traditional codebase, this led to a labyrinth of if/else code blocks and feature flags scattered throughout the logic.
With Runtime: Fast user experience iterations - One-line change for models and prompts: Want to swap an LLM provider or adjust a model parameter? That's a simple configuration change. A/B test variations and deploy customizations without touching production code.
- A/B testing at scale: We define agent behavior declaratively in JSON or through a fluent GraphBuilder API. Different clients get different graph configurations—not different code paths.
Live A/B Test: 50% traffic split to 2 models to observe what your users prefer
Why We're Sharing Inworld Runtime with You
We built Inworld Runtime to solve our own massive challenges in creating production-grade, scalable realtime conversational AI. But in doing so, we created a solution for a problem every AI developer faces: managing the inherent complexity of the "reason—act" agent cycle.
We believe the future of AI is not just about more powerful models, but better orchestration. It's about giving developers the architectural foundation they need to build robust, maintainable, and observable realtime conversational AI, without reinventing the wheel.
If you're tired of wrestling with tangled logic and want to focus on creating, we invite you to build your next experience on Inworld Runtime. Let us handle the complexity of orchestration, so you can focus on bringing your ideas to life.Get Started with Inworld Runtime
Inworld Runtime is the best way to build and optimize realtime conversational AI and voice agents.
You can build realtime conversational AI that is fast, easy to debug, and easy to optimize via A/B experiments.
Get started now with Inworld CLI:- Build a production-ready conversational AI or voice agent
- Deploy it to Inworld Cloud as an endpoint so you can easily integrate it into your app
- Monitor dashboards, traces, and logs in the Inworld Portal
- Improve user experience by running live A/B experiments to identify the best model and prompt settings for your users
Talk to our team
Original source Report a problem - Oct 22, 2025
- Parsed from source:Oct 22, 2025
- Detected by Releasebot:Dec 23, 2025
Introducing Inworld CLI
Inworld launches the Inworld CLI, a unified toolkit to build, deploy, and optimize realtime conversational AI. Expect faster performance, easier debugging, and live A/B testing from the command line with integrated telemetry and production endpoints.
Challenge
Until now, building realtime conversational AI meant facing:
- Performance Bottlenecks: Unpredictable latency from third-party APIs creates a jarring user experience. This is compounded by core language limitations, like Python's GIL, that block parallel execution and stall critical operations.
- High Development Overhead: Engineering resources are drained by maintenance. Teams spend more time debugging provider failures and integrating a complex patchwork of models than building new features, causing product velocity to stagnate.
- Slow Iteration Speed: Scattered conditional logic for different models and clients makes the entire system fragile. This fragility makes every change high-risk, paralyzing rapid A/B testing and stalling product improvements.
Inworld faced these very challenges as our customer base grew and expanded beyond games and into mobile apps, voice agents, ai companions, and more. We hence built Inworld Runtime to solve them.
Inworld Runtime
Inworld Runtime is the AI backend for realtime conversational AI. You build your conversational AI with Inworld Runtime SDK, launch a hosted endpoint using Inworld CLI, and observe and optimize your conversational AI by running A/B experiment in the Inworld Portal.
Today, building with Inworld Runtime just became easier with the launch of Inworld CLI.Inworld CLI
With Inworld CLI, developers can now build realtime conversational AI that are fast, easy to debug, and easy to optimize via A/B experiments.
- Build realtime experiences
npm install -g @Inworld.ai/clito install the Inworld CLIinworld loginto login and generate api keys automaticallyinworld initto initialize conversational AI pipelines such as LLM -> TTS - preoptimized for latency and flexibilityinworld runto test locally with instant feedbackinworld deployto create persistent, production-ready endpoints
- Monitor with clarity
- Integrated telemetry: Each request is automatically logged in dashboards, traces, and logs in Inworld Portal.
- Optimize continuously
inworld graph variant registerto run live A/B tests without client changes
Proven technology
Since launching Inworld Runtime earlier this year, we've seen developers build incredible realtime conversational AI experiences.
- Wishroll went from prototype to 1M users in 19 days with 20x cost reduction.
- Streamlabs built a real-time multimodal streaming assistant with under 500ms latency.
- Bible Chat scaled their AI-native voice features to millions. Inworld CLI builds on Runtime to help developers build agents more efficiently and reliably.
Get started with Inworld Runtime
Inworld Runtime is the best way to build and optimize realtime conversational AI and voice agents
Get started now with Inworld CLI:
- Build a prod-ready conversational AI or voice agent
- Deploy it to Inworld Cloud as an endpoint so you can easily integrate into your app
- Monitor dashboards, traces, and logs in the Portal
- Improve user experience by Run live A/B experiments to identify the best model and prompt settings for your users
Talk to our team
Original source Report a problem - Oct 15, 2025
- Parsed from source:Oct 15, 2025
- Detected by Releasebot:Dec 23, 2025
The new AI infrastructure for scaling games, media, and characters
Inworld launches Runtime, a high‑performance AI pipeline that scales voice‑forward, character‑driven experiences to millions. It connects LLM, STT, TTS with remote config, telemetry and multi‑vendor support; Unreal is available now in early access, Unity coming soon.
TTS is FREE for December! Plus 2.5x referral bonuses. $25 for you, $25 for them.
The new AI infrastructure for scaling games, media, and characters
Built on gaming and media innovation
We began by pushing the frontier of lifelike, interactive characters for games and entertainment, and this remains a core focus area. Today, Inworld powers real‑time, voice‑forward experiences and provides the infrastructure that lets those experiences scale from a prototype to millions of players without sacrificing quality. Partners across the industry, including Xbox, NVIDIA, Ubisoft, Niantic, NBCUniversal, Streamlabs, Unity, and Epic, have built with Inworld to explore new gameplay and audience experiences.
Long before “chat with anything” became a category, we were shipping playable, character‑centric demos and engine integrations that let teams imagine worlds where characters remember, react, and stay in‑world. That early craft in character design is still our foundation, and it is why leading studios and platforms continue to collaborate with us on the next generation of character‑driven interactive media.From demos to production: Deeper control through a new AI infrastructure
As partners moved from impressive demos to live titles, we hit the same wall every game team hits: keeping voice, timing, and consistency flawless at scale, which is what players actually feel. Text‑only stacks and one‑off integrations were not built for real‑time, multimodal workloads, and stitching providers together left developers without enough control to maintain user‑facing quality as usage spiked or audiences expanded.
That is why we built Runtime: to put developers in control of the entire pipeline, and to make measurement and experimentation first‑class, so quality can be maintained and extended to new geographies and demographics, with personalization where it matters.What is Inworld Runtime and how does it help you scale?
Inworld Runtime is a high‑performance, C++ graph engine (with SDKs like Node.js and Unreal) that orchestrates LLMs, STT, TTS, memory or knowledge, and tools in a single pipeline. Build a graph in code, ship it, then iterate with remote configuration, A/B variants (via Graph Registry), and built‑in telemetry without redeploying your game. It is the infrastructure we developed to support experiences with millions of concurrent users, now available to all developers.
Why this gives you more control and keeps quality tangible for users- Provider‑agnostic nodes so you can swap models and services without glue‑code churn or lock‑in.
- Remote config and Graph Registry to change prompts, models, and routing live, safely rolled out to cohorts.
- Targeted experiments to validate interaction quality for new geographies and demographics, including voices, timing, interruptions, prompts, and routing, and enabling personalization by segment.
- Observability for player‑perceived quality with traces, dashboards, and logs that expose latency paths, first‑audio timing, and lip‑sync cadence so you fix what users actually feel.
The approach is simple: one runtime for your entire multimodal pipeline (e.g. STT → LLM → TTS → game or media state), with observability and experimentation to optimize quality, latency, and cost for every audience.
Voice that keeps up with gameplay
Many teams start with TTS, then expand into full pipelines as they localize, personalize, and harden for live ops, testing variations for new geographies and demographics and locking in what works.
Inworld TTS delivers expressive, natural‑sounding speech built for real‑time play. You get low‑latency streaming, instant voice cloning, and timestamp alignment for lip‑sync and captions, plus multi‑language coverage and integrations with LiveKit, NLX, Pipecat, and Vapi for end‑to‑end real‑time agents. Pricing starts at $5 per 1M characters, so you can scale voice across large audiences.
Try the TTS Playground or call the API to integrate quickly.Proven at scale with industry leaders in games and media
- Xbox × Inworld: Multi‑year co‑development to enrich narrative and character creation for game developers.
- Ubisoft (NEO): Prototype showcased real‑time reasoning, perception, and awareness in characters powered by Inworld tech.
- NVIDIA (Covert Protocol): Social simulation and hybrid on‑device or cloud capabilities using NVIDIA ACE with Inworld.
- Niantic: From Wol to WebAR, teams used Inworld to bring AI characters into spatial experiences.
- Streamlabs: Intelligent streaming assistant jointly powered by Streamlabs, NVIDIA ACE, and Inworld generative AI.
- NBCUniversal and other media leaders: Runtime opened after building infrastructure to meet their scale and quality bars.
Continuing our character leadership at scale
We pioneered character‑first, real‑time interaction years before today’s wave. That DNA is alive and well, and now it is backed by an infrastructure layer that gives developers more control and a better fit for modern production: Runtime for orchestration and TTS for voice that performs under pressure. If you knew us for our previous character stack, you will find this generation faster to ship, safer to iterate, and easier to scale.
Learn more about how to create characters with RuntimeHow do I get started with Inworld Runtime?
- Explore the Runtime Overview for graphs, experimentation, and observability
- Try our Templates for Node.js CLI, Voice Agent, Language Learning, and Companion apps
- Test TTS capabilities in our TTS Playground
- Check integrations with LiveKit, Pipecat, Vapi, and NLX
- Contact our team if you're scaling voice-first or character-driven experiences
Unreal (Runtime) is available now for early access. Unity is coming soon. If you are scaling a voice‑first or character‑driven experience in games or media, we would love to help you map the pipeline and quality targets that matter for your audience. Start with the Runtime Overview and Templates.
Powering the future of interactive media
We are uniting believable AI characters and worlds with the runtime required to run them at multi‑million‑user scale. Build the worlds you want, with characters that truly come alive and stay alive, at scale.
Original source Report a problem - Oct 6, 2025
- Parsed from source:Oct 6, 2025
- Detected by Releasebot:Dec 23, 2025
Inworld CLI - Hosted Endpoint
npm install -g @inworld/cliQuickstart
- 3-Minute Setup: Single command installation, browser-based login, and instant API key generation.
- Local Development: Test your graphs instantly with inworld serve.
- Instant Deployment: Deploy to cloud with inworld deploy - no hosting, scaling, or infrastructure required.
- Oct 1, 2025
- Parsed from source:Oct 1, 2025
- Detected by Releasebot:Dec 23, 2025
Inworld + LiveKit: Unlocking studio-quality voice AI for real-time experiences at scale
Inworld and LiveKit announce real-time voice AI integration with TTS, voice cloning, and multilingual support via LiveKit's Agents framework. Developers can build immersive, voice-first experiences with under 200ms latency and affordable pricing. Aimed at democratizing high quality interactive voice apps.
It's time to bring your most ambitious AI applications to life with emotionally intelligent, real-time voice AI from Inworld. Now, you can use Inworld's pre-built voices or clone your own from a few seconds of audio in Inworld's API and via LiveKit's Agents framework. Inworld's multi-lingual, expressive voices are state-of-the art quality with real-time latency, but at about ~5% the cost of alternatives. You can learn more about Inworld's text-to-speech (TTS) models here.
Why Inworld + LiveKit for voice AI
You can now access Inworld voices and text-to-speech models via LiveKit's Agents framework plugin. This makes it easier for developers to create previously unimaginable, real-time voice experiences such as multiplayer games, agentic NPCs, customer-facing avatars, live training simulations, and more at an accessible price.
Experience Inworld TTS in a voice-driven, tabletop RPG game built by the LiveKit team. You can access the GitHub code repository to build your own voice-first, multi-agent game experience.
- Natural, conversational speech: Combine LiveKit's programmable audio pipeline with Inworld's state-of-the-art TTS and temperature controls for emotionally grounded, turn-based dialogue and instant feedback. Control voice switching and audio routing dynamically.
- Accessible pricing: Studio-quality voices for just $5/M characters, which is 5% of the cost of TTS from leading labs. That way you can build engaging experiences that scale with your users.
- Real-time latency: Generate and stream Inworld voices in under 200ms latency to first audio chunk via LiveKit's global edge infrastructure, which is ideal for real-time experiences with proven reliability in high-concurrency environments.
- Zero-shot voice cloning: Leverage Inworld's voice cloning capabilities to bring characters, brands, user-generated content, and more to life with emotion and personality using just 5-15 seconds of audio.
- Multilingual voices: Build agents in 11 of the most common languages for consumers, including English (with its various accents), Chinese, Korean, Dutch, French, Spanish, and more. You can also preserve accents for a specific voice when switching languages.
- Designed for developers: Build consumer applications with LiveKit's SDKs for web, mobile, and Unity. Craft custom voice pipelines using third-party integrations, RAG, and function calling. LiveKit's Agent framework also comes with performance metrics and debugging tools.
Built for builders
Get started in just minutes:
- Use LiveKit's Agent framework to stream audio via Inworld's TTS endpoint with the LLM and STT providers of your choice.
- Configure voice parameters, temperature control, and even language switching using Inworld's API.
- Deploy your agent to LiveKit's global infrastructure, allowing you to speak to your agent with real-time latency from anywhere in the world.
Ready to start building? Explore additional documentation to get started.
Inworld x LiveKit collaboration
Whether you are building immersive games, voice-first apps or agentic tools, Inworld + LiveKit is designed to give you full-stack control with real-world performance.
On June 17, 2025, Inworld and LiveKit hosted a Realtime AI Meetup in San Francisco. Hundreds of voice AI developers, founders, and enthusiasts gathered to explore how text-to-speech and speech-to-text models are built, key considerations for development, and how to maximize their potential in AI agents.
Our TTS modeling framework allows us to advance voice AI's emotional and contextual understanding and easily add new functionality, while keeping costs affordable. This helps democratize access to building high-quality, real-time voice experiences.”
Jean Wang, Inworld Head of Product.
Latency has a direct impact on user experience, and developers must consider how to manage it effectively. Fast models help, but efficient data streaming, optimized network communication, and prompting can also be crucial. Metrics like 'first response latency' are key. Developers should consider this not only in the models they use, but also in how they implement their applications.”
Michael Solati, LiveKit Developer Advocate.
Original source Report a problem - Sep 25, 2025
- Parsed from source:Sep 25, 2025
- Detected by Releasebot:Dec 23, 2025
Inworld meets Pipecat: Raising the bar for realtime voice AI
Inworld TTS now integrates with Pipecat, a vendor neutral open framework for realtime voice agents. The pairing enables low latency, emotive speech across web, mobile, and telephony with modular pipelines and zero‑shot voice cloning. Start building expressive, multilingual voice apps today.
The bar for interactive voice AI continues to rise. Today, we’re excited to announce that Inworld TTS is now fully integrated with Pipecat: the open-source, vendor-neutral framework, purpose-built for architecting realtime voice agents and multimodal AI applications. Pipecat helps developers manage the complex orchestration of AI services, "conversational" mode like interruptions and phrase endpointing, telephony and network transport, cross-platform libraries, audio processing, and multimodal interactions, all at ultra-low latencies. This integration makes it easier than ever to deploy fast agents with emotionally expressive speech, using Inworld’s TTS models in your own multimodal AI pipelines.
No matter what you're building - from voice assistants to AI companions, customer support agents, or immersive consumer experiences - Pipecat orchestrates the entire conversational flow while Inworld brings truly natural voice to life at a fraction of the cost of alternatives.What is Pipecat?
Pipecat is a fully open-source, vendor neutral framework that enables developers to connect STT, LLMs, and TTS into realtime pipelines. It’s designed to power voice-first, multimodal agents with high responsiveness and flexibility.
Pipecat gives developers the most flexibility; it stands out for many reasons:
- Avoid vendor lock-in: Pipecat is not tightly coupled to any vendor's infrastructure. Deploy Pipecat and use the infrastructure you prefer.
- Streaming-first architecture: Pipecat's realtime "frames" model enables TTS to begin speaking before a sentence is even complete, while Inworld TTS delivers rich, emotionally nuanced speech that's virtually indistinguishable from human conversation.
- Modular, pluggable pipelines: Connect Inworld's voices to any STT or LLM using Pipecat's modular pipeline. Run multiple models in parallel, test different configurations, and connect your agent to custom logic and databases with built-in tools and advanced function calling
- Native telephony, transport and cross-library support: Pipecat supports realtime AI client SDKs for JavaScript, React, iOS, Android, C++, and Python, and deploys across telephony (including native Twilio), WebSockets, SIP, and WebRTC. You can easily build for web browsers, mobile apps, or traditional phone systems; Pipecat's transport layer adapts to your needs while Inworld's voices maintain consistent quality across all platforms.
- Smart Turn v2 model: Create accurate turn detection with native audio. Smart Turn v2 is trained on audio data and uses the speaker's audio as input. This lets your agents make decisions using the intonation and pace of the user's speech, while they're using Inworld's rich, expressive voices. The model also is fully open source (weights, training script, data sets).
It’s a full orchestration layer for building rich, interactive, responsive AI experiences.
Realtime expression, delivered naturally
Inworld TTS was built for dynamic, emotionally intelligent speech, designed to handle the unpredictability and expressive needs of realtime interaction. Plus, developers get production-ready voice AI for just $5 per million characters - about 5% the cost of leading alternatives.
With Inworld TTS in a Pipecat pipeline, you get:- Millisecond audio synthesis that starts streaming before complete sentences
- Custom and pre-built voices across 11 languages, with more coming soon
- Natural vocalizations and emotional intelligence that adapts to conversational context
- Consistent low latency even with complex multi-step reasoning or function calls
- Zero-shot voice cloning from just seconds of audio - available free to all users
Together, Pipecat and Inworld make it possible to carry on a fluid, engaging conversation - in your browser, on the phone, or anywhere voice is used.
Start building today
This integration reflects both companies' commitment to democratizing access to cutting-edge AI technology. Pipecat's open-source approach removes barriers to experimentation and deployment, while Inworld's accessible pricing ensures that high-quality voice AI isn't limited to well-funded enterprises.
Original source Report a problem
The future of voice AI is expressive, accessible, and real-time. With Inworld + Pipecat, that future is available right now. - Sep 15, 2025
- Parsed from source:Sep 15, 2025
- Detected by Releasebot:Dec 23, 2025
Your AI is boring: Boost engagement and drive immediate performance improvements with Inworld Runtime and custom Mistral AI models
Mistral AI and Inworld unveil a joint Runtime to build, scale, and evolve consumer AI apps with purpose-built models and automated A/B testing. The partnership aims to replace generic LLMs with engaging, evolving experiences and seamless deployment at scale.
The two-layered problem
AI models are one-size-fits-all
General models, generic stories - Most teams plug in general-purpose LLMs trained on everything from source code to cookbooks. The breadth is useful for trivia, but it scrubs away personality. Dialogue sounds neutral and every engagement feels the same.
Safety before suspense - Standard model alignment rewards helpfulness and politeness. Necessary for support bots, fatal for entertainment. Models that are meant to entertain need to be able to break out of the one-dimensional persona that’s intentionally baked into public models.
Feelings, then forgetting - Base models can identify various emotions, but they don’t carry those emotions forward. Users are disappointed by isolated experiences that don’t build or impact future engagement.
AI app development is constrained
Productionization takes too long - While creating an AI demo takes hours, reaching production-readiness typically requires 6+ months of infrastructure and quality improvement work. Teams must handle provider outages, implement fallbacks, manage rate limits, provision and accelerate compute capacity, optimize costs, and ensure consistent quality. In building with category leaders, we saw how most consumer AI projects either make the leap or they stall out and die in the gap between prototype and scalable reality.
Maintenance is a burden - Most engineering teams spend over 60% of their time on maintenance tasks: debugging provider changes, managing model updates, handling scale issues, and optimizing costs. This leaves minimal resources for building new features, causing products to stagnate while competitors advance. We experienced this firsthand, as even innovative teams get trapped in maintenance cycles instead of building what users want next.
User expectations shift fast - Consumer preferences continuously evolve, but traditional deployment cycles of 2–4 weeks cannot match this pace. Teams need to test dozens of variations, measure real user impact, and scale winners - all without the friction of code deployments and app store approvals. Working with partners across the industry showed us that the fastest learner wins, but existing infrastructure makes rapid iteration nearly impossible.
A joint solution
Mistral AI and Inworld are partnering to provide frontier AI models and an intelligent Runtime that are purpose-built for building, scaling and evolving AI applications. Mistral AI has trained new weights from the ground up, purpose-built to be creative, engaging and to help shape immersive experiences. They’re designed to write with voice by harnessing distinct tone and rhythm instead of generic prose, and understand beats such as gamification, goal progression, long-term evolution.
Great models still need a living stage. That’s where Inworld’s Runtime comes in, turning static scripts into evolving experiences. Developers using Runtime can:
- Build applications from pre-optimized nodes that handle integration and automatically streamline data flows. The same graph scales effortlessly with minimal code changes and managed endpoints.
- Automate infrastructure withbuilt-in telemetry for logs, traces, and metrics. Portal surfaces bugs, user trends, and optimization opportunities, while Runtime manages failover, capacity, and rate limits. As you scale, it provides cloud resources to train, tune, and host cost-efficient custom models.
- Architect automated A/B testing without redeployments. Define variants, manage them in the Portal, and test models, prompts, and graph setups - deploying changes in seconds with automatic impact measurement
When developers use Mistral AI’s creative models through Inworld Runtime, they have access to the most powerful combination of state-of-the-art models and developer tools, purpose-built for powering the next generation of consumer applications that deliver the personalized, engaging, and immersive experiences users are craving.
Inworld’s Runtime SDKs work seamlessly out-of-the-box with Mistral Code, the AI-powered coding assistant that bundles powerful models, an in-IDE assistant, and enterprise tooling into one fully supported package from Mistral AI. Building with Inworld Runtime and Mistral Code assures that developers can build rapidly without worrying whether their code will be production-ready and able to scale. Developers can now focus on creating the features and apps that their users will find most engaging: development as it should be, now enabled by AI.
How to get started
Meet with the Mistral AI and Inworld team to discuss your application - goals, strategy, and growth plan. Then, Mistral AI and Inworld will work together to offer the right combination of frontier AI models and runtime tools.
Once your app is in production, developers can harness support from Mistral and Inworld to continuously scale and evolve their applications through model customization, cost-efficient model routing, user evaluations, model retraining, and much more.
Original source Report a problem - Sep 11, 2025
- Parsed from source:Sep 11, 2025
- Detected by Releasebot:Dec 23, 2025
Node.js Runtime v0.6.0
Breaking changes
Simplified interfaces and improved APIs.
Developers upgrading from v0.5 should review the breaking changes below.
- ExecutionConfig Access: New context.getExecutionConfig() method with automatic property unwrapping
- Graph Execution: Graph.start() now returns ExecutionResult with execution details
- Unwrapped Types: Cleaner GoalAdvancement and LLMChatRequest interfaces