Machine Learning

From Sound to Images, Part 2: Spectrogram Image Processing.

By Benjamin Hoffman and Grant Van Horn
Picking up on where we left off in the previous post, we will now look at the various ways one can transform the spectrogram image prior to analysis by a convolutional neural network (CNN) and how these transformations affect model performance. Amplifying a hidden signal With the spectrogram image in hand, the next challenge is…

From Sound to Images, Part 1: A deep dive on spectrogram creation.

By Benjamin Hoffman and Grant Van Horn
In our first post, we described the idea of using a computer vision model to identify bird vocalizations. But how does a computer vision model “listen” to a sound? For Sound ID, we use the short-time Fourier transform (STFT) to convert the raw waveform (which tracks air pressure as a function of time) into an…

Behind the Scenes of Sound ID in Merlin

By Benjamin Hoffman and Grant Van Horn
What is Sound ID? Today we announced one of our biggest breakthroughs—Sound ID, a new feature in the Merlin Bird ID app—and a major leap forward in sound identification and machine learning to date. Sound ID lets people use their phone to listen to the birds around them, and see live predictions of who’s singing. Currently,…