MuZero, the artificial intelligence that learns by trial / error and that Google is using to improve YouTube's algorithms

DeepMind is an artificial intelligence development company owned by Google that, for a few years, has set itself the mission of optimize and accelerate the AI ​​learning process.

Four years ago, DeepMind launched AlphaGo, an artificial intelligence that managed to beat a human master of Go (a complex game that, due to its strategy characteristics, had always resisted the more conventional AI).

AlphaGo achieved that thanks to having been trained for months in the analysis of thousands of games played between humans. However, just a year later, DeepMind introduced its successor: AlphaGo Zero, which It only took three days of training to beat her predecessor 100 times in a row.

The secret of this monumental advance was the commitment to a technique called 'reinforcement learning', which allows AIs to learn a task by themselves without knowing the rules of that task (the rules of chess, for example), but only the desired objective (to eat the opponent's 'king').

The following DeepMind developments. AlphaZero and the newly launched MuZero. have continued to opt for (and improve) reinforcement learning, and now Google is applying its learning ability to tasks far beyond board games and video games.

Thus, the search engine company has begun to apply MuZero's advances to the improvement of its own technology, using this AI to find a new way to encode videos ... and thus reduce YouTube costs:

"If you look at data traffic on the Internet, most of it is videos, so if we can compress the video more efficiently we can undertake massive savings ... and the initial experiments with MuZero [...] We are quite excited in that regard. "

It all comes down to trial / error

But, how do you achieve that of an AI learning to do something without anyone explaining it? DeepMind's Chief Scientist David Silver explains in a statement to the BBC:

"The real world is messy and complicated, and no one gives us instructions on how it works. Yet humans are capable of formulating plans and strategies for what to do next."

"[MuZero] part of nowhere, so by resorting to trial / error, he succeeds in both discovering the rules of his world and using them to achieve superhuman performance. "

Of course, in the case of an AI, trial / error can involve, for example, playing millions of games of a video game, taking note of which decisions led to victory or defeat in each case, thus privileging some and discarding others until their strategy is absolutely perfect.

And any computer task can be raised in video game formatThink of one that allows you to earn more points as you achieve a lighter video without losing image quality, and you will understand how YouTube is using MuZero's capabilities.