Moshi

Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec. Mimi processes 24 kHz audio, down to a 12.5 Hz representation with a bandwidth of 1.1 kbps, in a fully streaming manner (latency of 80ms, the frame size), yet performs better than existing, non-streaming, codecs like SpeechTokenizer (50 Hz, 4kbps), or SemantiCodec (50 Hz, 1.3kbps). Moshi models two streams of audio: one corresponds to Moshi, and the other one to the user. At inference, the stream from the user is taken from the audio input, and the one for Moshi is sampled from the model's output. Along these two audio streams, Moshi predicts text tokens corresponding to its own speech, its inner monologue, which greatly improves the quality of its generation. A small Depth Transformer models inter codebook dependencies for a given time step, while a large, 7B parameter Temporal Transformer models the temporal dependencies.

Features

Converts JSON to Java and Kotlin objects
Annotation-based mapping for flexibility
Supports custom adapters for complex models
Lightweight with minimal dependencies
JSON serialization and deserialization

Project Samples

Project Activity

See All Activity >

License

Apache License V2.0

Follow Moshi

Moshi Web Site

Other Useful Business Software

The ultimate digital workspace.

Axero Intranet is an award-winning intranet and employee experience platform.

Hundreds of companies and millions of employees use Axero’s intranet software to communicate, collaborate, manage tasks and events, organize content, and develop their company culture.

Learn More

Rate This Project

User Reviews

Be the first to post a review of Moshi!

Additional Project Details

Operating Systems

Linux, Mac, Windows

Programming Language

Python

Related Categories

Python Speech Software

Registered

2024-11-04

Similar Business Software

LALAL.AI

LALAL.AI is a next-generation audio separation service powered by advanced AI technology. With a suite of innovative tools - Stem Splitter, Voice Cleaner, Voice Changer, Voice Cloner, LALAL.AI enables users to take their audio content to the next level. Stem Splitter The core service of...

See Software
MixPad

With MixPad multi-track recording and mixing software, you can access all the power of professional recording and mixing equipment through a single platform that streamlines the process and makes mixing a breeze. Get the MixPad free multitrack recording software for non-commercial use only. If...

See Software
Hindenburg PRO

Hindenburg PRO is a multitrack audio editor designed for podcasters, audio producers and radio journalists. It might look like any other audio editor - but it’s not. The design and features are tailored specifically for spoken-word productions. Work smarter and faster with our easy-to-learn...

See Software
Audio AI Dynamics

🎶 Audio AI Dynamics (AAID): AI-powered tools for music creators 🎶 A suite of web-based audio tools designed to empower musicians, producers, and audio enthusiasts. Whether you're a pro or just starting out, Audio AI Dynamics offers a range of features to enhance your music workflow. 🎧 🔊...

See Software
MixStage

Built for Remote Post-Prod. Teams — Film Audio Review and Collaboration, Finally Solved. Livestream HQ from your DAW, directly in sessions.   Upload, Collaborate, and Export Anytime, Anywhere. Boost Your Film Audio Creation Process with MixStage Workflow.  20 Meeting participants....

See Software
AudioDesk

AudioDesk provides all of the high-end features you'd expect in a professional workstation application, like 24-bit/192-kHz recording and real-time, 32-bit effects processing. The software includes multi-track audio editing, sample-accurate placement of audio, a complete virtual mixing...

See Software