Page 1 of 1

Marketing books, publications and studies

Posted: Mon Jan 06, 2025 4:50 am
by Bappy10
is getting closer to achieving its goal of creating an AI model for 1,000 languages ​​to dethrone ChatGPT, announced in November. The company is developing all kinds of AI and with this latest one, they want to build a model that can understand the 1,000 most spoken languages ​​in the world.

Yu Zhang, research scientist, and James Qin, software engineer, Google Research, say in a statement that the Universal Speech Model (USM), as the AI ​​will be called, continues to advance.

Both describe USM as “a family of state-of-the-art speech models” with 2 billion parameters trained on 12 million hours of speech and 28 billion sentences in over 300 languages .

They also claim that this Google AI is already used on YouTube (for example, for subtitles). “It can perform automatic speech recognition (ASR) not only in widely spoken languages ​​such as English and Mandarin, but also in low-resource languages ​​such as Amharic, Cebuano, Assamese, and Azerbaijani, to name a few,” they report.

USM currently supports over 100 languages ​​and lays the foundation for building an even larger system.

Google examines the challenges of supervised learning
The company says there are two major challenges we need to address in supervised learning.

1. Getting enough data to train high-quality models: It takes too much time and money. Some languages ​​are hard to find. Self-supervised learning can take advantage of audio-only data, which is available in much larger quantities across all languages.

2. Models must improve computationally efficient while expanding language coverage and quality.

The company's approach: self-supervised learning with fine-tuning
For the first step, they use BEST-RQ . They claim that it has shown great results and is efficient when using very large amounts of unsupervised audio data.

In the second step of the process, a multi-objective supervised pre-training approach is used to incorporate additional knowledge from text data. This model includes an additional encoder module that takes text as input and additional layers to combine the output of speech and text encoding. The model is trained on unlabeled speech data, labeled speech, and text.

In the final stage of the process, the USM model is fine-tuned to downstream tasks. The overall training process can be illustrated in a simple way. Thanks to the knowledge gained during pre-training, USM models achieve high quality with only a small amount of supervised data in downstream tasks.

Google USM
Google USM General Training Line
If you want to know more about the statement, click here .

Newsletter
Subscribe to our newsletter!
WhatsApp
Follow MarketingDirecto.com on WhatsApp

Topics

ChatbotsGoogleArtificial intelligence


Share

Twitter



Portrait of the advertising industry: this is the sector through the eyes of the women who work in it
Free access is bleeding out on social media and TikTok is also stabbing it

TOP 5: MOST VIEWED

Media
Where to watch the series and films nominated for the 2025 Golden Globes?
Golden Globes 2025
05 January 2025


Marketing books, publications and studies
4 out of 10 consumers opt for practicality when shopping for Three Kings Day gifts
Three Wise Men
05 January 2025


Latest news

Media
Where to watch the series and films nominated for the 2025 Golden Globes?
Golden Globes 2025
05 January 2025


Advertising brands
Heineken 0.0 challenges the rules and shows that there is no excuse not to enjoy alcohol-free alternatives
Heineken 0.0 campaign
05 January 2025


4 out of 10 consumers opt for practicality when shopping for switzerland number data Three Kings Day gifts
Three Wise Men
05 January 2025


Christmas campaigns
The Three Kings' Parade 2025: LEGO, Amazon and more brands fill the parade with magic
Three Kings Parade
05 January 2025


Christmas campaigns
The Three Wise Men also need deodorant: Old Spice's fun help
Old Spice Christmas