Google taught Gemini 2.5 to understand and convey emotions in dialogues

At the Google I/O 2025 conference, the company introduced a revolutionary version of the multimodal Gemini 2.5 model with support for emotional voice interaction and real-time audio generation.

G. Ostrov

June 5, 2025

Google presented a significant update to its Gemini 2.5 artificial intelligence at its annual I/O 2025 conference, fundamentally changing the approach to voice interaction with AI. The new Gemini 2.5 Flash Preview model can now not only understand the emotional coloring of user speech but also adapt its responses with appropriate intonation and emotional expressiveness.

Key capabilities of emotional AI

Gemini 2.5\'s revolutionary features include recognizing emotions in the conversation partner\'s voice and the ability to generate responses with appropriate emotional coloring. The model can adapt not only intonation but also accent, ensuring natural communication in over 24 languages. The system can ignore background noise and integrate with external tools such as Google Search to obtain current information directly during dialogue.

Advanced text-to-speech features

Developers paid special attention to text-to-speech (TTS) capabilities. Gemini 2.5 allows users to precisely control voicing style, speech tempo, and emotional expressiveness. Moreover, the model supports generating dialogues with multiple voices, opening new horizons for creating podcasts, audiobooks, and other multimedia projects.

SynthID transparency technology

To ensure ethical transparency, all audio materials created by Gemini 2.5 are automatically marked with SynthID technology. This allows easy identification of content as AI-generated, which is critically important in the era of deepfakes and synthetic media.

Availability for developers

New capabilities are available to developers in preview through Google AI Studio and Vertex AI platforms. Feature testing can be conducted through special Stream and Generate Media tabs in Google AI Studio.

Gemini 2.5 represents a significant breakthrough in the field of multimodal AI systems, combining text, images, audio, and video into a unified intelligent platform. These innovations open broad perspectives for creating interactive applications, virtual assistants, and revolutionary solutions in education.

Learn more about Google AI and its products on the official Google AI website.

If you have any problems, write to us, we will help quickly and efficiently!