Last Updated:
Sound Waveform

Long Awaited Breakthrough in Sound Recognition Technology

Lucian L
Lucian L Science

Unless you are deeply involved in the machine learning world, you may not realize that as powerful as this technology is at the moment, one current limitation is the recognition of natural sounds. Natural sounds are those such as the noise from a wildly cheering crowd or perhaps the hard to ignore sound of waves crashing onto the beach.

Although it is most certainly true that the tremendous advances in the recognition of both images and actual speech itself, the question of what and how to deal with these so-called natural sounds has been a vexing question for quite some time.

That being said, all signs point to a possible solution. First take a step backward to understand the current methodology of dealing with these natural sounds. In its present form, the majority of the automated recognition systems in use today employ some form of machine learning. The important point to note here is that the machine learning technology is based on specially engineered computers sifting through vast amounts of data to uncover patterns. These patterns then become the training data for the machine learning.

However, that being said, you should also clearly understand that this training data in large part must be annotated by hand. Naturally you can readily understand that any sort of hand annotation requirement imposes what can be a prohibitively expensive time and labor cost.

That being said, researchers at the widely acclaimed Computer Science and Artificial Intelligence Laboratory (CSAIL) might just have developed a better system. Interestingly, the CSAIL researchers have discovered a technology that is based on video recognition. The CSAIL researchers trained the system first to recognize scenes and objects in a video. Next, the system meticulously analyzed its data set in a search for correlations between the visual object and the associated natural sounds.

Speaking about this exciting new technology, Carl Vondrick, an MIT graduate student in electrical engineering and computer science had this to say:”computer vision has gotten so good we can transfer it to other domains. We’re capitalizing on the natural synchronization between vision and sound”.

Naturally, you can understand that this research is but in the earliest stages of investigation. Nevertheless, even the CSAIL researchers are forecasting a sound recognition system that could be used to improve the context sensitivity of mobile devices. Stay tuned as this revolutionary new sound recognition technology continues to evolve.