In August, Meta introduced its multimodal translation model with artificial intelligence SeamlessM4T, which supports nearly 100 languages for text and 36 for speech. WITH updated architecture v2 now the company is expanding this tool to make conversational translations more spontaneous and expressivewhere expressiveness is the missing aspect for authentic conversation across languages.
The first of two new features is SeamlessExpressivewhich (as the name suggests) it transfers the expressiveness of the speaker into the translated speech. Specifically, these are voice pitch, volume, emotional tone (excitement, sadness, or whispering), pace of speech, and pauses. Given that translated speeches have always sounded robotic until now, this is a breakthrough potentially revolutionary. The currently supported languages include English, Spanish, German, French, Italian and Chinese.
The second function is SeamlessStreamingwhich will start translating the speech while the speaker is still speaking, allowing others to get the translation faster. There’s still a short latency of just under two seconds, but at least users won’t have to wait for someone to finish a sentence. According to Meta, the problem is that different languages have different sentence structures, so it had to develop algorithm that could tell if it had already acquired enough context to start generating translated outputor whether to continue listening.
Although it is not yet known when the public will be able to use these new features, it is already clear that Meta’s translation model is very impressive and will likely find widespread use in the future.