IT Brief Australia - Technology news for CIOs & IT decision-makers
Story image

Kyutai unveils Moshi AI with groundbreaking vocal abilities

Tue, 16th Jul 2024

Kyutai, a non-profit organisation dedicated to open research in artificial intelligence (AI), unveiled its latest development, Moshi, in Paris today. The AI model, which boasts unprecedented vocal capabilities, was developed in just six months by a team of eight researchers.

The public presentation allowed attendees, including researchers, developers, entrepreneurs, investors, and journalists, to interact with Moshi directly. This hands-on demonstration showcased the model's unique ability to communicate in a smooth, natural, and expressive manner. Participants were particularly impressed by Moshi's potential applications, which were illustrated through interactions that included coaching, companionship, and roleplay character incarnations.

Moshi's interactive demo will be available online from the Kyutai website today, marking a world first for a generative voice AI. Kyutai's aim is to make this technology openly accessible, enabling wider exploration and experimentation within the AI community.

Kyutai highlighted Moshi's exceptional text-to-speech capabilities, noting its ability to convey emotion and facilitate interaction between multiple voices. Unlike comparable models, Moshi is designed to be compact and can be installed locally on devices, ensuring safe operation even without an internet connection.

Kyutai intends for Moshi to contribute significantly to the broader AI research ecosystem. The organisation plans to share the code and weights of the model freely, an unprecedented move for such advanced technology. This open-source approach is expected to benefit researchers and developers working on voice-based products and services. By making these resources available, Kyutai aims to foster innovation and allow the community to extend Moshi's knowledge base and factuality over time.

Stephen Zhang, a senior researcher at Kyutai, commented on the launch: "Moshi represents a significant step forward in voice interaction technology. By making it openly accessible, we hope to facilitate a leap in AI research and applications, ensuring that this powerful tool can be adapted and improved upon by the global AI community."

Additionally, Kyutai emphasised that Moshi's capabilities are particularly pertinent to applications that involve speech in the digital world. Its potential uses range from enhancing user experiences in virtual assistants to contributing to more effective educational tools and therapeutic applications.

The Kyutai team demonstrated Moshi's creative capabilities during the presentation by showcasing various roleplay scenarios. This aspect of the AI's functionality illustrated not only its technical proficiency but also its potential to engage users in more immersive and interactive ways.

Looking ahead, Kyutai plans to continue its dedication to open research in AI. Initially founded by the iliad Group, CMA CGM, and Schmidt Sciences, the organisation aims to expand its research to include multimodality. This involves developing models capable of learning and performing inference using different types of content such as text, sound, and images.

The research lab, which now includes a dozen members, intends to launch its first PhD theses by the end of the year. Future projects will likely continue to focus on creating general-purpose AI models with high capabilities, ensuring that all developments are freely shared along with the software and methodologies that led to their creation.

Follow us on:
Follow us on LinkedIn Follow us on X
Share on:
Share on LinkedIn Share on X