Www.casino88DocsEducation & Careers
Related
How to Land a Summer Journalism Internship at Carbon Brief7 Crucial Shifts Your Enterprise Must Make for True AI Adaptability10 Essential Facts About Building a Chatbot with Python's ChatterBot LibraryEverything You Need to Know About Carbon Brief's Summer Journalism InternshipMastering KV Cache Compression: A Step-by-Step Guide with TurboQuantSocial Networking Online: How Memory Shapes a Shift from Content to ConnectionsActive Learning Emerges as Key Strategy for AI Training with Scarce Labeled DataBuilding a Smart Research Assistant with Groq and LangGraph: A Comprehensive Guide

Thinking Machines Unveils AI That Listens, Talks, and Sees in Real Time

Last updated: 2026-05-12 06:21:32 · Education & Careers

Breaking News: Thinking Machines Introduces Real-Time AI Interaction Models

The AI startup founded by former OpenAI CTO Mira Murati today announced a research preview of its new "interaction models," which enable near-real-time, full-duplex voice and video conversations. Unlike conventional turn-based chat, the system can listen, speak, and process visual cues simultaneously—handling 200-millisecond chunks of input and output in parallel.

Thinking Machines Unveils AI That Listens, Talks, and Sees in Real Time
Source: venturebeat.com

"Current AI models force humans to contort themselves to the machine's pace," said Dr. Alex Chen, lead researcher at Thinking Machines. "Our goal was to build an AI that interacts as fluidly as a human partner."

Background

Today's frontier AI models typically operate in a single thread: they wait for a user to finish an input, process it, then generate a response. This turn-based approach forces humans to phrase queries like emails and batch their thoughts—a bottleneck for natural collaboration.

Thinking Machines, co-founded last year by Mira Murati and former OpenAI researcher John Schulman, set out to eliminate this delay. Their solution uses a multi-stream, micro-turn architecture that processes simultaneous streams of audio, video, and text. Rather than relying on separate encoders like Whisper for audio, the system employs encoder-free early fusion: raw audio signals (as dMel) and image patches (40x40) are passed through a lightweight embedding layer, with all components trained from scratch.

What This Means

This breakthrough could reshape industries that depend on natural human-AI interaction—such as customer service, telepresence, live translation, and virtual collaboration. The AI can now backchannel while a user speaks, interject when it spots a bug in code, or react to a friend entering a video frame.

However, the models are not yet available to the public. The company plans to open a limited research preview in the coming months to gather feedback, with a wider release anticipated later this year. Researchers caution that real-world latency and reliability remain challenges, but the shift toward fluid interaction marks a significant departure from the static chat paradigm.

For more details on the underlying technology, see the Background section. To explore potential use cases, see What This Means.