What Is Machine Listening?

Machine listening is processing of sounds through a computer in a way that mimics human signal processing. Computers can be programmed and trained to recognize and interpret a range of audio inputs. This technology can be applied in a broad assortment of ways, from intelligence analysis to the study of music. Researchers in this field work at private companies, academic institutions, and government agencies to improve machine listening tools and find new applications. It integrates elements of acoustics, electrical engineering, robotics, and signal processing.

In order to recognize sounds, computers need to be able to hear and process them. They may use sound pickups to handle ambient noise, or can listen to recordings. Sounds can be run through algorithms to determine what they are and what to do with them. Computer responses can depend on their programming, training, and level of sophistication.

A simple example of machine listening can be seen with clappers and voice activated software. Clappers allow people to turn circuits on and off with a hand clap that activates the base unit. Software that responds to voice commands can allow people to control it with their voices, which requires the ability to identify the voice and interpret the sounds. Such programs may use training to learn to recognize a specific speaker and handle accents, changes in syntax, and other variations between speakers.

More complex machine listening can be used in fields like music, where researchers identify and study patterns. Forensic musicologists, for instance, can compare and contrast music from different sources and may use machine listening in their work. They can determine if music appears to have a common origin or has other characteristics of interest. This technology can also be used to study harmony and develop theories about what historic music might have sounded like.

Intelligence analysis also relies on machine listening. Huge amounts of audio data in the form of telephone conversations, discussions in public spaces, and so forth may need to be processed by intelligence agencies. Paying human beings to listen to all the audio and develop reports can be expensive, and bored listeners might miss important information. Machine listening can allow an agency to automatically process audio to pull out data that requires close attention, based on keywords, stress tones in voices, and other parameters. Intelligence analysts can prioritize their work on the basis of this automatic analysis to listen to the audio that is most likely to be important first.