Speech analytics is a computerized technique use to analyze the content of speech. It is not simply a speech-to-tech conversion tool. Instead, it is designed to detect patterns in speech, including both content and tone.
The simplest use of speech analytics is to measure how often particular phrases are used. Before speech analytics, this was only possible by transcribing a recording then using a computer or manual search to look for particular phrases. With speech analytics, a computer system can be pre-programmed to “listen” for a particular word and can even produce real-time information.
There are multiple uses for such technology. A company with a call center could analyze the conversations staff have with customers to detect patterns in complaints. For example, while call logs may show a particular product causes a lot of complaints or queries, speech analytics may show that a particular aspect of the product, such as a line in the instructions, is frequently mentioned. A law enforcement authority or security service could analyze the phone calls it monitors to see if a particular phrase is being mentioned by suspects.
More sophisticated speech analytics can be used to analyze tone and even context. For example, a telemarketing company will usually keep track of the percentage of calls it makes that result in a sale, but won’t necessarily have statistics to show why people turned down the offer. Analyzing tone might show that the number of people who reply with an angry tone is disproportionately high at a certain time of day. This may suggest the problem is not so much that the product is unattractive, but rather that people are annoyed at being called after a certain hour and are more likely to be hostile to a sales attempt regardless of the product.
There are several different types of speech analytics, each bringing added accuracy to the results, and increasing both the time it takes to scan speech and the amount of speech needed to detect a pattern. The simplest type is phonetic, which breaks down speech into individual sounds. While unsophisticated, this does make it easy to search for new phrases without having to rescan the speech from scratch. Keyword spotting looks for entire words from the outset. Large-vocabulary continuous speech recognition aims to effectively transcribe all the speech, so that an entire conversation is available for analysis.