The hypothetical Samsung Galaxy Z Fold3 says yes to the S-Pen, but it will not arrive integrated

Google explains how it can recognize songs just by humming them

Google recently launched one of the most spectacular features yet: song recognition through humming. Just ask the Google Assistant 'what song is this' and hum your tune to it so you can recognize it, all through artificial intelligence, as usual in Google.

We are going to tell you how Google has achieved this, since they wanted to explain it thoroughly on his own blog. From a hummed melody to an exact search result with the song you were thinking of. How is it possible to achieve it?

Isolate the melody to achieve the result

Most music recognition models work by taking a sample of the sound, transforming it into a spectrogram (like the one you see above) and comparing that spectrogram with the ones they have in the database. The problem with hums is that the spectrogram has less information, since it only contains the melody.

Google Spectogram Full spectrogram on the right.

A complete spectrogram has instruments, lyrics, rhythm, and all the key elements in a song. As you can see in the image, corresponding to the 'Bella Ciao' spectrogram, the difference between the studio spectrogram and the humming spectrogram is quite clear. To solve the problems with such amount of information, Google focuses on the melody, so the other elements of the song do not matter.

In very summarized terms, in order not to get into technicalities, Google has a database with more than 50 million spectrograms, with which you can find the songs that we hum, with only the dominant melody of the song. All this is achieved even with background noise, since the model only focuses on that melody.

Training the model


To achieve this purpose Google made certain modifications to the Now Playing and Sound Search recognition model, which have been with us for a long time. For this training, he used a pair system (humming audio and recorded audio), generating different inlays of each one. In other words?

State of the art software for transcribing interviews, lectures and trials

Google exposes its neural network millions and millions of times to these pairs, until it is able to generate humming embeddings similar to the reference recording. With this system, Google claims that is able to recognize 4 out of 5 songs and, in our tests, the efficiency is quite high.

To get songs hummed, Google used an audio extraction software, to isolate the dominant melody in the song

As we have said, Google needs millions and millions of hummed songs to compare with the originals, so it had to simulate hums, with a software called SPICE, capable of extracting the tones of the songs. To give you an idea, this is the original audio and this is the audio generated by the software. The output of the software is refined by a neural network, which allows it to be even cleaner.

This explanation should also help us understand that, at least in this case, Google does not use user data to create this system. Undoubtedly the new hums will serve to continue training the network and make it more precise, but the original method is what they show: simulate humming songs to compare them with the originals.