AI Voice and Speech Release Notes

Release notes for AI voice synthesis, text-to-speech and audio generation tools

Get this feed:

Products (9)

Latest AI Voice and Speech Updates

  • Apr 13, 2026
    • Date parsed from source:
      Apr 13, 2026
    • First seen by Releasebot:
      Apr 14, 2026
    Eleven Labs logo

    Eleven Labs

    April 13, 2026

    Eleven Labs adds conversation topic discovery, knowledge base search, flexible branch merging, secret dependency lookups, new LLM options, and a non-community voice filter, while also shipping SDK and package updates with bug fixes and better voice volume handling.

    ElevenAgents

    Conversation topic discovery: A new conversation topics endpoint (GET /v1/convai/agents/{agent_id}/topics) returns the latest topic discovery results for a given agent, surfacing recurring themes across conversations. The list conversations endpoint now accepts a topic_ids query parameter to filter conversations by discovered topic.

    Knowledge base content search: A new search knowledge base endpoint (GET /v1/convai/knowledge-base/search) provides fuzzy text search over knowledge base document content. Results include highlighted snippets with SearchHighlightSegment objects and support cursor-based pagination with configurable page sizes (up to 100, default 30). Filter by document type with the optional types parameter.

    Flexible branch merging: The merge branch endpoint no longer restricts merges to the main branch -- any branch can now be used as a merge target. A new optional force boolean parameter overrides timestamp-based conflict resolution when set to true. Branch metadata responses now include parent_branch_id and merged_from_version_id for better lineage tracking.

    Secret dependency management: A new get secret dependencies endpoint (GET /v1/convai/secrets/{secret_id}/dependencies/{resource_type}) returns paginated dependency lookups filtered by resource type (tools, agents, or phone_numbers). The list secrets endpoint now accepts a dependency_limit query parameter to control how many dependencies are previewed per secret, with agents_has_more, tools_has_more, and phone_numbers_has_more flags indicating when additional dependents exist beyond the preview.

    New LLM options: Added gemini-3.1-pro-preview, qwen35-35b-a3b, and qwen35-397b-a17b to the list of available LLM providers for agent configuration.

    Voices

    Non-community voice filter: The voice_type query parameter on the list voices endpoint now accepts non-community, which returns personal and workspace voices combined while excluding library copies. This provides a convenient filter for teams that want to see only their own voices without community-shared ones.

    SDK Releases

    Python SDK

    v2.43.0 - Fern regeneration for the April 13, 2026 API schema, including knowledge base content search, conversation topics, flexible branch merging, and new LLM providers.

    JavaScript SDK

    v2.43.0 - Fern regeneration for the April 13, 2026 API schema, including knowledge base content search, conversation topics, flexible branch merging, and new LLM providers.

    Swift SDK

    v3.1.3 - Reduced background noise detection in software muted VAD, preventing false voice activity triggers when the microphone is muted.

    Packages

    @elevenlabs/[email protected] - Fixed getInputVolume() and getOutputVolume() returning 0 in React Native by adding native volume providers using LiveKit's RMS and multiband FFT processors. getInputByteFrequencyData() and getOutputByteFrequencyData() now return data focused on the human voice range (100-8000 Hz), which is more useful for voice visualization, and on web getInputVolume() and getOutputVolume() are also computed from this range.

    @elevenlabs/[email protected] - Fixed startSession errors being swallowed instead of surfaced via onError in ConversationProvider. Previously, when Conversation.startSession() rejected (e.g., "agent not found"), the UI would get stuck in "connecting" with no error feedback.

    @elevenlabs/[email protected] - Updated to @elevenlabs/[email protected] and @elevenlabs/[email protected] with native volume provider fixes.

    @elevenlabs/[email protected] - Fixed transcript message ordering in voice mode where agent responses could appear before user messages.

    API

    Original source
  • Apr 10, 2026
    • Date parsed from source:
      Apr 10, 2026
    • First seen by Releasebot:
      Apr 11, 2026
    Hume logo

    Hume

    April 10, 2026

    Hume adds configurable turn detection and interruption settings to EVI configs, giving users finer control over turn-taking, speech detection, and interruption behavior on a per-config basis.

    EVI API additions

    Added configurable turn detection and interruption settings to EVI configs. You can now control how EVI handles turn-taking and interruptions on a per-config basis.

    • turn_detection.end_of_turn_silence_ms: How long EVI waits after speech ends before committing a turn (500-3000ms, default 800ms).
    • turn_detection.speech_detection_threshold: Sensitivity of voice activity detection (0.0-1.0, default 0.5).
    • turn_detection.prefix_padding_ms: Audio padding before detected speech (default 300ms).
    • interruption.min_interruption_ms: Minimum speech duration before EVI can be interrupted (50-2000ms, default 800ms).
    Original source
  • All of your release notes in one feed

    Join Releasebot and get updates from Eleven Labs and hundreds of other software products.

    Create account
  • Apr 7, 2026
    • Date parsed from source:
      Apr 7, 2026
    • First seen by Releasebot:
      Apr 8, 2026
    Eleven Labs logo

    Eleven Labs

    April 7, 2026

    Eleven Labs adds scoped conversation analysis, agent test folders, workflow tool and knowledge base overrides, hide-all response filtering, and richer conversation, voice, and speech-to-text metadata, plus SDK and client updates.

    ElevenAgents

    • Scoped conversation analysis: Evaluation criteria and data collection items can now be scoped to conversation (full transcript) or agent (only the portion where a specific agent was active). Added scope field to PromptEvaluationCriteria, data_collection_scopes to agent platform settings, and a new scoped array of ScopedAnalysisResult on the conversation analysis response. This is particularly useful for multi-agent workflows where each agent should be evaluated independently.
    • Agent test folder management: Tests can now be organized into folders. New endpoints for creating, retrieving, updating, and deleting test folders, plus bulk moving tests between folders.
    • Tool and knowledge base overrides in workflows: Agent workflow node overrides now support tool_ids and knowledge_base fields in PromptAgentAPIModelOverrideConfig, allowing workflow nodes to control which tools and knowledge base documents each sub-agent can access.
    • Response filter hide_all mode: Added hide_all option to ResponseFilterMode, forcing the system to hide all fields of a tool response from the agent.
    • Visited agents in conversation history: The get conversation response now includes a visited_agents array of VisitedAgentRef objects (with agent_id and branch_id), tracking which agents participated in a multi-agent conversation.
    • Multimodal message support in hooks: The useConversationControls hook now exposes sendMultimodalMessage, and the MultimodalMessageInput type is exported from @elevenlabs/client, making it easier to send images and other multimodal content during conversations.

    Speech to Text

    • Audio duration in responses: The convert speech to text response now includes an audio_duration_secs field, providing the total duration of the transcribed audio without requiring client-side calculation.

    Voices

    • Recording quality and review status: The get voice response now includes recording_quality (enum: studio, good, ok, poor, bad) and labelling_status (enum: in_review, review_complete) fields, providing visibility into voice quality assessment.

    SDK Releases

    JavaScript SDK

    • v2.42.0 - Fern regeneration for the April 7, 2026 API schema, including scoped analysis, test folders, and DTMF input support.

    Python SDK

    • v2.42.0 - Fern regeneration for the April 7, 2026 API schema, including scoped analysis, test folders, and DTMF input support.

    Swift SDK

    • v3.1.2 - Fixed ObjC category dispatch for LiveKit delegate methods, resolving potential crashes in Swift-based voice agent integrations.

    Packages

    • @elevenlabs/[email protected] - Exposed sendMultimodalMessage in the client API and exported the MultimodalMessageInput type for sending images and other multimodal content during conversations.
    • @elevenlabs/[email protected], @elevenlabs/[email protected], @elevenlabs/[email protected], @elevenlabs/[email protected] - Fixed Node.js ESM compatibility by adding explicit .js extensions to all relative imports and setting "type": "module" on @elevenlabs/types.
    • @elevenlabs/[email protected] - Added automatic language selection from localStorage history and browser language preferences for the embeddable widget.

    API

    Original source
  • Apr 1, 2026
    • Date parsed from source:
      Apr 1, 2026
    • First seen by Releasebot:
      Apr 1, 2026
    Eleven Labs logo

    Eleven Labs

    April 1, 2026

    Eleven Labs adds new agent workflow controls, conversation file uploads, analysis reruns, tool response mocking, mTLS auth, search sorting, and conversation duration messaging, while also shipping music video to music and speech-to-text from URL plus SDK updates and breaking package releases.

    ElevenAgents

    MCP tool scoping in agent workflows:

    Agent workflow nodes can now restrict which MCP tools a sub-agent is permitted to call. When tool inheritance is disabled on a node, only the explicitly selected MCP tools are loaded for that sub-agent, giving teams precise control over tool access per workflow step.

    Conversation file uploads:

    The create agent and update agent endpoints now support a file_input field on ConversationConfig. When enabled, end users can attach images or PDFs in chat (requires an LLM with multimodal input support). Configurable with enabled (boolean) and max_files_per_conversation (integer).

    Re-run conversation analysis:

    New run conversation analysis endpoint (POST /v1/convai/conversations/{conversation_id}/analysis/run) re-evaluates a completed conversation using the agent's current evaluation criteria and data collection settings, without needing a new call.

    Tool response mocking for tests:

    Agent simulation tests and test suite invocations now support a tool_mock_config field to control how tool calls are handled during testing. Use MockingStrategy (all, selected, none) to choose which tools are mocked and MockNoMatchBehavior (call_real_tool, raise_error) to set the fallback when no mock matches.

    Text search sort order:

    The text search conversations endpoint now accepts a sort_by query parameter with values search_score (default) or created_at, allowing you to control whether results are ordered by relevance or recency.

    mTLS auth connections:

    Agent auth connections now support mutual TLS (mtls) as an auth_type, in addition to the existing options. New CreateMTLSAuthRequest and MTLSAuthResponse schemas are available for creating and retrieving mTLS-authenticated connections.

    Max duration message:

    Added max_conversation_duration_message field to agent configuration. When set to a non-empty string, the agent will send this message to the user when the maximum conversation duration is reached.

    Branch and environment in conversation initiation:

    Added branch_id and environment optional fields to the conversation initiation client data (ConversationInitiationClientDataRequest) and submit batch call request body, enabling routing to specific agent branches and environments.

    Music

    Video to music:

    New POST /v1/music/video-to-music endpoint generates background music from one or more video files. Videos are combined in sequence. Accepts optional description (up to 1,000 characters) and tags (up to 10 style tags such as upbeat or cinematic) to influence the generated track.

    Speech to Text

    Transcribe from URL:

    The convert speech to text endpoint now accepts a source_url parameter (string, optional) for transcribing audio or video from a hosted URL, including YouTube videos, TikTok videos, and other video hosting services. This can be used as an alternative to uploading a file directly.

    Voices

    Total count in shared voices list:

    The list shared voices response now includes a total_count field, making it easier to implement pagination and display result counts.

    SDK Releases

    JavaScript SDK

    • v2.41.0 - Added support for the multimodal_message WebSocket event type in ElevenAgents real-time conversations. Includes Fern regeneration for the latest API schema updates.
    • v2.41.1 - Fern regeneration to match the April 1, 2026 API schema.

    Python SDK

    • v2.41.0 - Fixed audio_interface to be optional in text-only conversation mode, resolving a runtime error when starting a session without audio. Includes Fern regeneration for the latest API schema.

    Packages

    This release includes the new v1.0.0 of the client side Agent SDKs. It features major breaking changes to @elevenlabs/client, @elevenlabs/react, and @elevenlabs/react-native. Review the migration guidance below before upgrading.

    To help upgrade, we released a Skill to help your agents upgrade for you. Install it with

    npx skills add elevenlabs/packages
    

    You can read more about the v1 release and its improvements on our developer blog.

    @elevenlabs/[email protected] — Breaking changes:

    Input and Output classes are no longer exported. Use the InputController and OutputController interfaces from @elevenlabs/client instead.

    Conversation is no longer a class — it is now a namespace object and a type alias for TextConversation | VoiceConversation. Remove any instanceof Conversation checks and subclasses.

    The default connectionType is now inferred from the conversation mode: voice conversations use "webrtc" by default, and text-only conversations use "websocket". To keep the previous behavior for voice, pass connectionType: "websocket" explicitly.

    VoiceConversation.wakeLock is now private. Pass useWakeLock: false in session options to suppress wake lock management.

    changeInputDevice() and changeOutputDevice() now return Promise<void> instead of Promise<Input> or Promise<Output>.

    Replace conversation.input.analyser.getByteFrequencyData(data) with conversation.getInputByteFrequencyData().

    Replace conversation.input.setMuted(v) with conversation.setMicMuted(v).

    Replace conversation.output.gain.gain.value = v with conversation.setVolume({ volume: v }).

    getInputVolume(), getOutputVolume(), getInputByteFrequencyData(), and getOutputByteFrequencyData() now return 0 or an empty Uint8Array instead of throwing when no conversation is active.

    @elevenlabs/[email protected] — Breaking changes:

    useConversation now requires a ConversationProvider ancestor. Wrap your component tree in <ConversationProvider> and move options to the provider or to the hook.

    DeviceFormatConfig and DeviceInputConfig exports are removed. Use FormatConfig and InputDeviceConfig from @elevenlabs/client instead.

    New granular hooks replace the monolithic useConversation for better render performance: useConversationControls(), useConversationStatus(), useConversationInput(), useConversationMode(), useConversationFeedback(), and useRawConversation(). Each hook subscribes only to the state it needs, preventing unnecessary re-renders.

    New useConversationClientTool(name, handler) hook for registering client tools that agents can invoke, with automatic cleanup on unmount.

    Added controlled mute support via isMuted and onMutedChange props on ConversationProvider.

    @elevenlabs/[email protected] — Breaking changes:

    The previous ElevenLabsProvider and useConversation API have been removed and replaced with re-exports from @elevenlabs/react. Replace ElevenLabsProvider with ConversationProvider and useConversation with the granular hooks (useConversationControls, useConversationStatus, etc.).

    On React Native, the package now polyfills WebRTC globals, configures the native AudioSession, and registers a platform-specific voice session strategy on import.

    @elevenlabs/[email protected]

    • Exports the CALLBACK_KEYS runtime array containing all keys from the Callbacks interface, used internally by the React SDK for callback composition.

    @elevenlabs/[email protected]

    • Added the guardrail_triggered server-to-client WebSocket event and the onGuardrailTriggered callback, which fires when the server detects a guardrail violation during a conversation. Also added type discriminants to TextConversation and VoiceConversation to enable discriminated union narrowing, and added startSession overloads that narrow the return type based on the textOnly option.

    @elevenlabs/[email protected]

    • Added the guardrail_triggered WebSocket event and onGuardrailTriggered callback, consistent with @elevenlabs/[email protected].

    @elevenlabs/[email protected]

    • Added client-side support for mocking tool responses in agent conversations, enabling test scenarios that simulate tool call outcomes without invoking real tools.

    @elevenlabs/[email protected]

    • Added type definitions for tool response mocking in agent conversations.

    @elevenlabs/[email protected]

    @elevenlabs/[email protected]

    API

    Original source
  • Mar 31, 2026
    • Date parsed from source:
      Mar 31, 2026
    • First seen by Releasebot:
      Apr 1, 2026
    Speechify logo

    Speechify

    Speechify Launches Windows App with On-Device Voice AI and Real-Time Text to Speech

    Speechify launches a native Windows app with real-time text to speech and voice typing, bringing voice AI to PCs with optional on-device processing for faster, privacy-first workflows across Intel, AMD, Qualcomm, and Copilot+ devices.

    Speechify Windows app brings on-device voice AI, text to speech, and voice typing to PCs with real-time performance and privacy-first processing.

    Today Speechify announced the launch of its native Windows application, bringing real-time text to speech and voice typing to Windows users with the option to run entirely on-device.

    Speechify, widely recognized as the world’s most used text to speech app, continues expanding its voice-first platform to desktop environments with this release. The Windows app introduces a unified system for listening, speaking, and writing using voice across one of the largest computing ecosystems in the world.

    The app is available for both x64 devices powered by Intel and AMD and Arm64 devices powered by Qualcomm, including Copilot+ PCs. Users can choose between cloud-based and on-device processing and switch between them instantly.

    Bringing Voice AI to Windows

    Speechify’s Windows launch extends its Voice AI platform to over a billion Windows users globally. The app allows users to listen to documents, dictate text, and interact with content using voice across their daily workflows.

    Speechify combines text to speech and speech to text into a single system designed for productivity. Users can convert PDFs, emails, websites, and documents into audio, or use voice typing to write across applications in real time.

    When on-device mode is enabled, voice data never leaves the user’s machine. This gives users full control over how their data is processed while still maintaining real-time performance.

    By leveraging GPU acceleration with intelligent fallback alongside NPU support, Speechify is able to deliver consistent performance across devices, resulting in faster time-to-market for users on AMD, Intel, and Qualcomm PCs.

    Thanks to Windows ML, the Speechify team is able to expand access to on-device models and features across x64 and Arm64 systems, while scaling to additional silicon through GPU support when dedicated NPU acceleration is not available.

    Built for On-Device AI Across Modern Windows Hardware

    Speechify’s Windows app is designed to run across multiple architectures and chipsets using a unified system.

    The platform supports:

    • x64 devices powered by Intel and AMD
    • Arm64 devices powered by Qualcomm
    • NPU-accelerated systems such as Copilot+ PCs
    • GPU-accelerated Windows machines

    By using the Windows ML stack and ONNX Runtime, Speechify is able to deploy multiple production AI models locally across these environments from a single codebase.

    These models include real-time text to speech, voice activity detection, and speech to text transcription, enabling a complete voice workflow directly on-device.

    Real-Time Voice Typing and Transcription

    Speechify enables real-time voice typing across Windows applications. Users can activate dictation with a shortcut and instantly convert speech into text in any input field.

    The system processes speech continuously, allowing users to write emails, documents, and messages without switching tools.

    On supported devices, transcription can run entirely on-device. Users can also switch to cloud-based processing depending on their needs, with the system adapting instantly at runtime.

    Designed for Seamless, Continuous Use

    Speechify engineered the Windows app for uninterrupted voice workflows.

    Audio input, transcription, and playback are handled through a real-time pipeline that minimizes latency and avoids gaps in speech. This allows users to move naturally between listening and speaking within the same workflow.

    The app also includes native Windows integrations such as system-wide shortcuts, direct text insertion into active fields, and screen-based text capture.

    Built for Windows, Not Ported to It

    Speechify’s Windows app is built as a native application with deep integration into the Windows platform.

    This enables:

    System-wide voice typing across applications
    Real-time text insertion into active fields
    OCR-based text capture from the screen
    Secure local storage using Windows encryption

    These are platform capabilities that make this Speechify app truly built for Windows.

    Driving Growth Across Professionals and Enterprise

    The Windows launch reflects growing demand from professionals and enterprise users who want voice AI integrated directly into desktop workflows.

    Speechify has seen increasing adoption among users who rely on voice to process large amounts of information, write faster, and reduce time spent on manual reading and typing.

    "Over a billion people on this planet use Windows," said Cliff Weitzman, Founder and CEO of Speechify. "With this Windows launch, we're making sure that reading, and now writing, is never a barrier, no matter what device you use or how you prefer to work. We're especially excited about the opportunity in the enterprise given how many professionals have asked for Speechify on their PCs."

    A Step Toward Voice-First Computing

    Speechify’s Windows release reflects a broader shift toward voice-first computing.

    Instead of relying only on typing and reading, users can now listen to information, ask questions, and generate content using voice. This reduces friction between consuming and creating information and allows users to move faster through their workflows.

    Availability

    The Speechify Windows app is available now for x64 and Arm64 devices through the Microsoft Store.

    About Speechify

    Speechify is a voice AI platform that helps people read, write, and understand information using speech. Trusted by more than 50 million users worldwide, Speechify provides text to speech, voice typing dictation, AI podcasts, AI note taking, and a conversational voice AI assistant across iOS, Android, Mac, Windows, web, and browser extensions. Speechify supports more than 1,000 natural sounding voices across over 60 languages and is used in nearly 200 countries. In 2025, Speechify received the Apple Design Award for its impact on accessibility and productivity.

    Original source
  • Mar 26, 2026
    • Date parsed from source:
      Mar 26, 2026
    • First seen by Releasebot:
      Mar 27, 2026
    Resemble logo

    Resemble

    The Deepfake Threat Moved Faster. So Did We.

    Resemble expands watermarking to images and video, ships detection upgrades including reverse image search, non-face content coverage, Zero Retention Mode, custom vocabulary, and a free Deepfake Detector beta for browser, web, and X.

    Obaid Ahmed

    When Zohaib and Will came back from MWC Barcelona, the story they told stuck with me.

    They ran a game at the booth. Played audio clips to people and asked: real voice or AI-generated? Engineers. Enterprise buyers. Security professionals. People who work in this space every day. Around 70% could not tell the difference.

    I keep thinking about that.

    I’ve been building in voice AI long enough to know that stat should not surprise me. But it did. Because those are exactly the people who are supposed to catch this.

    The gap between real and synthetic has not just narrowed. For most practical purposes, it has closed. And if the people who are supposed to detect it can’t, the rest of us definitely can’t.

    So the question we kept coming back to after Barcelona: what do you actually need to operate in a world where you cannot trust what you hear, see, or read anymore?

    This is what we built in Q1.

    You can’t always tell what’s real anymore.

    I’m thinking about this more as a provenance problem than a detection problem. Not catching fakes after they spread. Proving what you made at the moment you made it.

    For a while our watermarking API worked on audio only. That made sense when we first built it. It does not make sense anymore. AI-generated images and video are spreading at the same rate as audio. Conflict zones, boardrooms, political campaigns. We had not kept up.

    So in Q1 the team extended it to cover images and video too. The system auto-detects file type, encodes an invisible watermark, and supports sync mode if you need the result right away. One API, three media types, no separate pipelines. Watermarking is how you prove what you made before someone makes something in your name.

    [Read the docs →]

    Five detection updates, shipped this quarter

    Watermarking protects what you generate. But a lot of synthetic media already exists with no provenance at all, and more is being created right now. That’s where detection comes in.

    The research and engineering teams shipped five updates to the detection stack this quarter; these are a few that are top of mind.

    Accuracy and provenance, together.

    Image DFD accuracy improved from F1 0.961 to 0.970. But honestly the more interesting update is reverse image search, now live in both the UI and API. Pass use_reverse_search: true in your detection request and you get back provenance data, matches against known debunked hoaxes, and a spread indicator showing how widely the image has already circulated.

    This catches things pure ML models miss. Images too new to be in any training set. Synthetic media that has been spreading for days before anyone flags it. If a fake has no digital footprint, the model has nothing to compare it to. Reverse image search gives you a second layer that does not depend on prior training.

    [Reverse Image Search]

    Non-face content.

    We extended coverage to AI-generated content without faces. We shipped this immediately when we saw the type of synthetic conflict imagery circulating on social platforms. The team turned it around in days. This was a case where being behind was not an option.

    Three more worth knowing about:

    • Identity now surfaces directly in the main Detect app. One request returns both a verdict and the closest matching speaker identity.
    • Intelligence Reporting now runs three expert analysis layers in a single structured output: our own model, OpenAI, and Anthropic.
    • Voice Activity Detection has been updated to filter non-speech content more cleanly, which means fewer false positives on audio with background noise or music mixed in.

    We also shipped Zero Retention Mode. The ask from enterprise and government customers was consistent and non-negotiable: submitted media cannot be viewable after analysis. Not by our staff, not by our systems, not by anyone. Files are purged automatically after processing. If that has been the thing blocking a procurement conversation, it is not the thing anymore.

    [Learn More in the Docs →]

    On the generation side: custom vocabulary and NVIDIA ACE

    While all of this was happening, the team also shipped on the generation side.

    Custom vocabulary does not get the attention it deserves. It is not a flashy feature. It is the thing that makes TTS actually usable in healthcare, pharma, and legal. The industries where getting a word wrong is not an inconvenience. It is a compliance problem. I have been pushing for this one for a while.

    The old fix was retraining. Slow and expensive. The new fix: pass a short audio sample at inference time and the model corrects the pronunciation on the fly. No retraining, no pipeline changes, correct on the first pass.

    [Custom Vocabulary]

    [Hear it now →]

    And on the quality side: NVIDIA named Chatterbox in their GDC 2026 RTX developer blog as part of NVIDIA ACE, their suite for expressive on-device NPC voice in games. NPC dialogue is historically one of the first things players skip. The bar for voice that actually holds attention is high. Being named in that context matters to us.

    Start here

    We built detection for enterprises. But I don’t think detection should only be accessible to enterprises.

    Try out @resemble_detect bot on X.

    It’s free, you don’t need to create an account, and there is no API key. Tag it with any image or video and get a result back. It’s the quickest way to try our deepfake detection.

    Resemble AI Deepfake Detector (beta). Puts a Scan button on every image, video, and audio element on any webpage. Powered by the same detection technology used by enterprises and government agencies, now free to try. Install it, sign in to your Resemble AI account, and a Scan button appears automatically on media across Twitter/X, Reddit, Instagram, TikTok, LinkedIn, and more.

    Each scan returns:

    • A verdict: Authentic, AI Generated, or Uncertain
    • A confidence score and rationale
    • Heatmap visualization, frame-by-frame video breakdown, and per-segment audio scores

    This beta release includes 4 free scans per day.

    [Get the extension today →]

    What’s Next?

    I’m genuinely proud of what this team shipped in one quarter. The research team turned around several model updates, one of them in direct response to synthetic conflict imagery spreading in real time. Engineers extended watermarking to three media types without breaking the existing API. The team behind the Deepfake Detector and the X bot made detection accessible to anyone with a browser.

    None of this is abstract to me. I track it in Linear every day. We still have alot of work to do.

    If you want to understand the full scale of what we are building against, the Deepfake Threat Report 2025 is a great place to start.

    [Download the report today →]

    Original source
  • Mar 25, 2026
    • Date parsed from source:
      Mar 25, 2026
    • First seen by Releasebot:
      Mar 26, 2026
    Eleven Labs logo

    Eleven Labs

    March 25, 2026

    Eleven Labs adds Basic and Full Seats for Workspaces, with paid plans including 20 Basic Seats and extra Full Seats.

    Workspace

    Basic and Full Seats: Workspaces now support two seat types - Full Seats with unrestricted access to all products, and Basic Seats with full access to ElevenAgents and ElevenAPI and limited ElevenCreative usage. All paid plans include 20 Basic Seats. Enterprise admins can purchase additional Full Seats directly from workspace settings. Learn more.

    Original source
  • Mar 23, 2026
    • Date parsed from source:
      Mar 23, 2026
    • First seen by Releasebot:
      Mar 23, 2026
    Eleven Labs logo

    Eleven Labs

    March 23, 2026

    Eleven Labs adds new ElevenAgents workspace APIs for environment variables and auth connections, plus knowledge base refresh, guardrail retries, conversation history, webhook content types, WhatsApp outbound messaging, workflow conditionals, and updated SDK support.

    ElevenAgents

    Environment Variables API: New environment variables endpoints for managing workspace-level configuration that agents can reference at runtime via {{system_env__<label>}} templating. Supports string, secret, and auth-connection variable types with per-environment value overrides. Endpoints include create, list, get, and update.

    Auth Connections management: New workspace-level auth connections API for managing authentication credentials used by agent tools and integrations. Supports multiple auth methods including OAuth2 client credentials, JWT, basic auth, bearer tokens, custom headers, and integration-managed OAuth2 authorization code flows. Endpoints include create, list, and delete operations.

    Knowledge Base URL refresh: New refresh endpoint (POST /v1/convai/knowledge-base/{documentation_id}/refresh) to re-fetch and update content for URL-sourced knowledge base documents. Knowledge base documents also now support auto-sync configuration with enable_auto_sync, auto_remove, and auto_sync_info fields.

    Guardrail retry with feedback: Custom and content guardrails now support configurable trigger actions with EndCallTriggerAction and RetryTriggerAction options. When set to retry, the agent re-generates its response with injected system feedback up to 3 attempts. Available placeholders include {{trigger_reason}} and {{agent_message}} for contextual retry guidance.

    Conversation History system variable: Added a new system__conversation_history dynamic variable that provides a lazily-evaluated, JSON-serialized conversation history at runtime. This is useful for passing full conversation context to tools, webhooks, or sub-agent handoffs.

    Webhook tool content type: Server tools now support a configurable content type for webhook body parameters, allowing you to choose between application/json and application/x-www-form-urlencoded formats. The URL-encoded format is useful for integrating with legacy systems, OAuth token endpoints, and payment processors.

    WhatsApp outbound messages: Updated the WhatsApp integration with a new outbound message dialog in the dashboard, enabling agents to send outbound messages in addition to calls through the WhatsApp channel.

    Agent and resource listing filters: The list agents, list knowledge base documents, and list tools endpoints now support a created_by_user_id query parameter (use @me for the current user). The previous show_only_owned_agents and show_only_owned_documents parameters are deprecated.

    Workflow conditional expressions: Added a new conditional_operator AST node to the workflow expression schema, enabling branching logic within agent workflow definitions.

    Music

    Music Marketplace: New Music Marketplace documentation covering the marketplace for licensing AI-generated music, including usage types, creator payouts, and licensing details.

    SDK Releases

    Python SDK

    v2.40.0 - Added environment parameter support for ElevenAgents conversations, enabling environment-specific agent connections. Fern regeneration to match the latest API schema including environment variables, auth connections, knowledge base refresh, and guardrail trigger actions.

    JavaScript SDK

    v2.40.0 - Added multimodal_message WebSocket event type for ElevenAgents real-time conversations. Fern regeneration to match the latest API schema including environment variables, auth connections, knowledge base refresh, and guardrail trigger actions.

    iOS SDK

    v3.1.1 - Added environment parameter support for environment-specific agent connections. Fixed a visionOS build error by bumping the platform requirement to v2.

    Packages

    @elevenlabs/[email protected] - Added multimodal_message WebSocket event type for ElevenAgents real-time conversations.

    @elevenlabs/[email protected] - Added multimodal_message WebSocket event support.

    @elevenlabs/[email protected] - Added multimodal_message WebSocket event support.

    @elevenlabs/[email protected], @elevenlabs/[email protected] - Updated to @elevenlabs/[email protected].

    @elevenlabs/[email protected], @elevenlabs/[email protected] - Updated to @elevenlabs/[email protected].

    Packages v1.0.0 Release Candidate

    The first release candidate for v1.0.0 of the ElevenAgents client SDKs is now available. This is a major release with breaking changes that improve the API surface, add granular React hooks for better render performance, and unify the React Native SDK with the React SDK.

    To try out the release candidate:

    npm install @elevenlabs/client@next @elevenlabs/react@next @elevenlabs/react-native@next
    

    An elevenlabs:sdk-migration skill is available to help AI coding assistants automatically migrate your codebase to the new APIs. Add it to your Claude Code, Cursor, or Windsurf project to get guided migration support.

    Key changes in v1.0.0:

    @elevenlabs/client: Conversation is now a namespace object and type alias for TextConversation | VoiceConversation instead of a class. The Input and Output classes are replaced by InputController and OutputController interfaces, with new convenience methods on the conversation instance (setMicMuted(), setVolume(), getInputByteFrequencyData(), getOutputByteFrequencyData()). changeInputDevice() and changeOutputDevice() now return void.

    @elevenlabs/react: useConversation now requires a ConversationProvider ancestor. New granular hooks for fine-grained re-rendering: useConversationControls(), useConversationStatus(), useConversationInput(), useConversationMode(), useConversationFeedback(). New useConversationClientTool() hook for dynamically registering client tools from React components with full type safety.

    @elevenlabs/react-native: Complete API rewrite replacing ElevenLabsProvider and useConversation with ConversationProvider and the same granular hooks from @elevenlabs/react. WebRTC polyfills and native AudioSession configuration are applied automatically on import.

    See the full release notes and migration guides:

    @elevenlabs/[email protected]

    @elevenlabs/[email protected]

    @elevenlabs/[email protected]

    @elevenlabs/[email protected]

    API

    Original source
  • March 2026
    • No date parsed from source.
    • First seen by Releasebot:
      Mar 17, 2026
    Speechify logo

    Speechify

    Free Voice Typing Dictation. Just Talk.

    Speechify releases expansive Voice Typing across macOS, Chrome, and desktop apps, enabling users to dictate 5x faster with AI-assisted edits and punctuation. It works in Gmail, Google Docs, Slack, Notion and more, with multilingual support, accessibility benefits, and SOC 2 compliance for secure, hands‑free writing.

    Free Voice Typing Dictation

    Write 5× faster with free voice typing dictation on any app or website. Talk naturally — Speechify perfects it with zero typos

    Free Voice Typing Dictation. Just Talk.

    Write 5× faster with voice typing

    • 1M+ 5-star Reviews
    • 55M+ Users

    Keyboard

    • 40 Words Per Minute

    You talk faster than you type. three to five times, actually l

    Speechify Voice Typing

    • 160 Words Per Minute

    You talk faster than you type. Three to five times, actually. And now, that matters. You can just say it. Voice Typing-powered by Speechify. From Google Docs to Gmail.

    VOICE TYPING DICTATION

    MAC APP

    MAC APP

    Voice type across any app on your desktop – Slack, email, Word, iMessage, Chrome, and beyond. Just start talking and Speechify polishes your writing.
    Download for macOS

    CHROME EXTENSION

    CHROME EXTENSION

    Use voice typing on any website. Perfect for Gmail, Google Docs, ChatGPT, and more. Voice typing is 5x faster without typos
    Add to Chrome

    Just Tap and Talk
    Voice typing dictation allows you to write 5x faster, so you can speed through any Google Doc, email, or message

    AI Auto Edits
    Speechify fixes small mistakes as you dictate, adjusting punctuation and phrasing for clean, natural text

    Works Everywhere
    Use dictation in Gmail, Google Docs, Notion, Slack, ChatGPT, and more.

    Hands-Free & Inclusive
    Write and reply without typing. Ideal for multitasking, accessibility, and anyone who thinks faster than they type

    SOC 2 Type II Compliance
    Speechify meets strict industry standards for security, availability, and data protection — so your content stays safe and private

    Let Speechify Type for You
    Get Speechify and start writing with your voice.
Faster, easier, and more natural than typing

    Made for Everyone
    Dictation fits naturally into any workflow, helping you move faster, stay focused, and express ideas without ever touching the keyboard

    For Professionals
    Dictate emails, reports, and updates without breaking focus — perfect for busy workflows

    For Students
    Take notes, write essays, or record study thoughts hands-free while researching or reviewing materials

    For Creators
    Capture ideas, scripts, or captions as they come — voice typing keeps up with your creativity in real time

    For Multitaskers
    Reply, search, and summarize while cooking, walking, or working — no keyboard needed

    For Accessibility
    Make typing effortless for everyone. Dictation supports users who prefer or need hands-free control

    And More
    From brainstorming ideas to filling out forms or writing captions — Voice dictation adapts to any task

    Start Using Speechify Today

    Speechify has made my editing so much faster and easier when I’m writing. I can hear an error and fix it right away. Now I can’t write without it.
    Daniel
    Writer

    I used to hate school because I’d spend hours just trying to read the assignments. Listening has been totally life changing. This app saved my education.
    Ana
    Student with Dyslexia

    Speechify makes reading so much easier. English is my second language and listening while I follow along in a book has seriously improved my skills.
    Lou
    Avid Reader

    Let Speechify Type for You

    FAQ

    Speechify is a Voice AI Assistant that lets users research topics and get answers through natural voice conversations, listen with text to speech, capture ideas via voice typing and AI note taking, and create AI podcasts.

    Speechify is a more powerful Voice AI Assistant than Gemini, Grok, Perplexity, and ChatGPT because it combines conversation, research, voice typing with AI note-taking, text to speech, and AI podcast creation into one voice-driven experience.

    No. Speechify replaces the need for multiple AI assistants by offering conversational AI, voice-driven research, text to speech, voice typing, AI note taking, and podcast creation in one tool.

    Speechify is a Voice AI Productivity Assistant designed to help users think, learn, dictate through voice typing, take AI notes, listen with text to speech, and create AI podcasts through voice, not just trigger actions or answer simple questions like Siri or Alexa.

    Speechify Voice Typing is a AI voice dictation tool that converts your spoken words into written text instantly.

    Speechify Voice Typing uses advanced transcription AI and AI voice dictation to accurately capture your speech and turn it into text in real time.

    Yes, Speechify Voice Typing can transcribe your spoken emails directly into your email app or platform.

    Yes, Speechify Voice Typing helps students transcribe lectures and study sessions to improve retention.

    Speechify Voice Typing uses encrypted processing to protect your AI voice dictation and transcription data.

    Yes, Speechify Voice Typing can handle long-form speech and convert it into clean transcription.

    Speechify Voice Typing turns your spoken notes into clean, readable text by removing filler words and fixing grammar, making transcription an easy way to stay focused without writing everything down manually.

    Yes, Speechify Voice Typing can type punctuation automatically, while also cleaning up grammar and removing filler words so your text stays polished and accurate.

    Speechify Voice Typing provides high transcription accuracy using advanced natural language processing, and it also cleans up grammar, removes filler words, understands punctuation commands, and delivers smooth, polished text even when your speech isn’t perfect.

    Yes, Speechify Voice Typing offers multilingual speech to text support across many languages and accents.

    Speechify Voice Typing makes writing essays, emails, reports, and more faster, easier, and more natural by letting you speak your thoughts directly.

    Yes, Speechify Voice Typing saves time by capturing speech 3–5x faster than manual typing.

    No, you don’t have to speak perfectly with Speechify Voice Typing, because it automatically cleans up grammar, removes filler words, and smooths out your speech into polished text.

    You should use Speechify for dictation because its Voice Typing feature delivers highly accurate transcription, also includes powerful extra tools like a Voice AI assistant that can answer questions or summarize content, plus text to speech in 200+ lifelike voices to help you read, review, and stay productive anywhere.

    Recent Posts:

    • How to Use Dictation and Voice Typing in Google Docs — March 7, 2026
    • How to Use Dictation and Voice Typing in ChatGPT — March 5, 2026
    • Speech to Speech and ASR at Speechify — February 20, 2026
    • How to Use Speechify Voice Typing Dictation in Google Docs — February 18, 2026
    • How to Use Speechify Voice Typing Dictation in Outlook — February 17, 2026
    • Speechify vs. Otter: Why Speechify Is the Better Choice for Professionals — February 16, 2026
    • How to Use Speechify Voice Typing Dictation in Notion — February 16, 2026
    • How to Use Speechify Voice Typing Dictation in Gmail — February 15, 2026
    • How to Use Speechify Voice Typing Dictation in Replit — February 15, 2026
    • How to Use Speechify Voice Typing Dictation in ChatGPT — February 14, 2026
    • A Comprehensive Guide to Dictation & Voice Typing Tools — February 13, 2026
    • How to Use Speechify Voice Typing Dictation in Slack — February 13, 2026
    Original source
  • Mar 16, 2026
    • Date parsed from source:
      Mar 16, 2026
    • First seen by Releasebot:
      Mar 16, 2026
    Eleven Labs logo

    Eleven Labs

    March 16, 2026

    Eleven Labs releases ElevenAgents GA with a unified Users page, dynamic SIP headers, and enhanced conversation filtering. It adds retention controls, content guardrails, workspace group listings, bulk invites seat_type, mobile SSO fixes, new media task types, visuals indicators, styling options, and SDK updates.

    ElevenAgents

    Users page is now generally available: The list users page, which groups conversations by a user identifier, is now available to all workspaces. This allows you to view all users who have interacted with your agents, their conversation history, and contact details in a unified view.

    SIP inbound headers as dynamic variables: Custom SIP X- headers from inbound SIP trunking calls are now automatically exposed as dynamic variables in ElevenAgents conversations. Any custom SIP header (e.g., X-Contact-ID, X-Campaign-ID) passed by the caller is available in the agent prompt using {{sip_contact_id}}, {{sip_campaign_id}}, etc. These variables are also visible in the conversation history under the Phone Call tab. Reserved headers such as X-Call-ID and X-Caller-ID continue to map to system__call_sid and system__caller_id and are not overridden.

    Conversation filtering by tool outcome: The list conversations and text search conversations endpoints now accept tool_names_successful and tool_names_errored query parameters (array of strings) to filter conversations by which tools succeeded or returned errors during the call.

    User listing improvements: The list users endpoint now supports a sort_by query parameter accepting last_contact_unix_secs (default) or conversation_count, and a branch_id query parameter to filter users by agent branch.

    Force delete tools: The delete tool endpoint now accepts a force query parameter (boolean, default false). When set to true, the tool is deleted even if it is used by agents, and it is automatically removed from all dependent agents and branches.

    Conversation embedding retention: The ElevenAgents settings response now includes conversation_embedding_retention_days, which controls how long conversation embeddings are retained (maximum 365 days). A null value uses the system default of 30 days.

    Content threshold guardrail: Added the ContentThresholdGuardrail schema, which provides a configurable threshold-based guardrail with an is_enabled flag and a threshold value for content moderation.

    Workspaces

    Get all workspace groups: New GET /v1/workspace/groups endpoint returns all groups in the workspace, including each group's name, ID, members, permissions, usage limit, and character count.

    Seat type in bulk workspace invites: The invite multiple users endpoint now accepts an optional seat_type field to specify the seat type (e.g., workspace_member, workspace_admin) for all invited users in a bulk operation.

    Mobile SSO reliability improvements: Fixed a regression where SSO login on mobile would get stuck on "Authenticating..." after a user logged out and attempted to sign in again. Also fixed workspace switching for mobile SSO users who sign in via SAML or OIDC providers.

    Music

    Section duration control: The generate music detailed endpoint now supports respect_sections_durations (boolean), which controls whether the model adheres to the duration specified for each section in the composition plan.

    Studio

    Chapter visual content indicator: Chapter response models now include a has_visual_content boolean field indicating whether the chapter contains visual content.

    Text shadow and outline styles: Studio text styling now supports text_shadow and text_outline options via the new StudioTextStyleShadowModel and StudioTextStyleOutlineModel schemas, enabling richer visual text customization within Studio projects.

    Media generation clip task type: The PendingClipTask.type enum now includes the media_generation value, expanding clip task categorization to cover media generation workflows.

    SDK Releases

    Python SDK

    • v2.39.0 - Added support for the multimodal_message WebSocket event type in ElevenAgents real-time conversations.
    • v2.39.1 - Fern regeneration to match the latest API schema.

    JavaScript SDK

    • v2.39.0 - Fern regeneration to match the latest API schema.

    iOS SDK

    • v3.1.0 - Several improvements to the ElevenAgents Swift SDK:Server errors are now surfaced through the onError callback, making error handling more consistent.
    • Added event-based agent state management for cleaner state observation.
    • Software muting is now supported alongside speech detection, enabling more granular audio control.
    • Improved codebase to use Swift Concurrency throughout.
    • Fixed a crash that occurred during startup on iOS release builds.
    • Updated branding to reflect the ElevenAgents rebrand.

    Packages

    API

    Original source