By Dave DeFusco

Imagine someone talking in a video conference while a piece of music is playing in the background. Besides being distracting, the music makes it hard for you to understand the speaker when you’re listening afterward to the recording.

Dr. Youshan Zhang, assistant professor of computer science and artificial intelligence, and Jialu Li of Cornell University have created a novel noise removal method that could benefit the hearing impaired and improve the listening experience for audiophiles everywhere.

An example of speech denoising. At left, original speech with noise and, at right, denoised speech audio.

In their paper, “BirdSoundsDenoising: Deep Visual Audio Denoising for Bird Sound,” the researchers described how they created a deep visual audio denoising (DVAD) model using a dataset of 15,300 bird sounds—varying in length from 1 second to 15 seconds—that strips out the background noise, in this case natural sounds like wind and rain, to produce clean bird sounds.

Dr. Zhang said the model is robust enough to apply to human speech, especially to background noise that is particularly damaging to speech intelligibility for people with difficulty hearing.

“Our DVAD model can first denoise the background noise and then increase the volume of the low voice,” he said.

Professor Youshan Zhang in Hawaii at the WACV conference.

In a novel twist, the researchers turned the audio of the bird sounds into a series of images; used a photo editing tool that eliminates the original background of an image without compromising its integrity; created a segmentation model to edit out the noisy parts of the image; and then applied an algorithm to produce the “denoised,” or clean bird sounds.

“To the best of our knowledge, we are the first to transfer audio denoising into an image segmentation problem,” said Dr. Zhang. “By removing the noise area in the audio image, we can realize the purpose of audio denoising.”

Background noise removal is the ability to enhance a noisy speech signal by isolating the dominant sound. It’s used in audio and video editing software, video conferencing platforms and noise-canceling headphones. It’s a fast-evolving technology, with artificial intelligence bringing a whole new domain of approaches to improve the task.

“Extensive experimental results demonstrate that our proposed model achieves state-of-the-art performance,” said Dr. Zhang, who presented the DVAD model at the recent WACV conference in Hawaii. “We also show that our method can be easily generalized to speech denoising, audio separation, audio enhancement and noise estimation.”

 

Comments are closed.