Automatic transcription of audio files - and why manual transcription may be better!

We get a lot of potential customers looking for software that will automatically transcribe their audio or video files for them. HyperTRANSCRIBE software makes transcription a lot easier, however, it cannot transcribe automatically -- just as HyperRESEARCH can aid in coding & analyzing qualitative data, but it can't do your research for you.

We would love to be able to provide software that automatically transcribes speech. Unfortunately the technology to support speech recognition simply hasn't gotten to the point where automatic transcription of audio or video recordings can match, let alone surpass, the accuracy of (good) human transcriptions.

 

One of the reasons for this is that speech is incredibly complex, with variations in accents and enunciation as well as pitch and tone of voice, making it hard to match spoken words to written ones.

Commercially available speech-to-text software, such as Nuance's "Dragon Naturally Speaking" software, generally works best if you "train" the software to a specific voice. Some researchers use Dragon Naturally Speaking to create transcripts by training it to their voice. They then listen to the audio they wish to transcribe and re-speak what they hear for the software to translate. There's a further step needed, however: proofreading the transcription and correcting any errors.

Progress is being made on cracking the "speech to text" nut. Apple recently applied for a patent on a method to convert speech to text and text to speech. And some voicemail providers offer automatic speech-to-text transcriptions of incoming voicemail.

Our own toll-free number and voicemail provider, Grasshopper, introduced this technology a little while ago. Here's an example that points out many of the problems currently inherent in automatic speech to text technology. This is an actual transcription we received for one of our incoming voicemails:

New Feature! Voice to Text
"Hi my name is Barbara thank you hello the phone number is [omitted for privacy] ... the questions is really down the freeway sorry ... I have downloaded ... the hi square research and hyper transcribe ... the trial the trial version and I'm trying to get to this is holly ... and I can't I'm not seeming to find a documentation folder so ... again I know it's the fleas and short so hopefully can give me a callback on how I can access pass that would be great thanks bye bye."

Have a human transcribe this voicemail for better accuracy

Note the "Have a human transcribe for better accuracy" link.... Even speech-to-text providers realize that human transcription is often more reliable and accurate than the software can be. Of course, that depends on the skills of the transcriber...

Now here's my own transcription of the same voicemail.

Hi, my name is Barbara [last name omited for privacy],  the phone number is [omitted for privacy]

The question is really dumb and I'm really sorry.

I have downloaded HyperRESEARCH and HyperTRANSCRIBE, like the trial version, and I'm trying to get to the tutorials, and I can't. I'm not seeming to find a documentation folder.

So, again, I know it's the full (fool?) user error, I'm sure, so if somebody could give me a call back on how I can access stuff that would be great. Thanks, bye-bye!

Now, Grasshopper's automatic transcription actually did pretty well. It had trouble with the caller's last name (so did I, due to noise in the audio) and it's not familiar with the term "HyperRESEARCH" (transcribed as "hi square research"). It also replaced "ums" and "uhs" and pauses with "..." as a matter of course (a good decision on the part of the programers, as such "filler sounds" rarely have significance in voicemail).

Where it did have trouble (such as "I know it's the fleas and short" instead of "I know it's the full user error, I'm sure," there was noise on the line and the volume of the voicemail dropped a bit, making it hard for this human transcriber to make the words out. While I heard "full user," Paul, who was sitting in the same room as I transcribed this, heard "fool user."

And that's one thing that gives human transcriptions the edge over automatic transcriptions: humans can compensate, at least to a degree, for another person's mumbling or to poor audio quality and other problems that can affect the clarity of the speech being transcribed.

Human transcribers also have the luxury of determining how accurately they're going to transcribe a given audio or video file (or voicemail).

I had several options:

  • Transcribe verbatim, including "ums" and repetitive phrases such as "like, like," and even enter indications of non-language cues such as laughter and sighs
  • Skip over the "ums" and pauses as I transcribed (which is what I decided to do)
  • Transcribe only the relevant parts of the message.

With this latter approach, my transcription would be shorter:

Hi, my name is Barbara [last name omitted for privacy],  the phone number is [omitted for privacy]

I have downloaded HyperRESEARCH and HyperTRANSCRIBE, like the trial version, and I'm trying to get to the tutorials, and I can't. I'm not seeming to find a documentation folder.

So, if HyperTRANSCRIBE doesn't do the transcription for you, how does it help?

HyperTRANSCRIBE lets you open and play most popular audio and video formats, and provides both graphical and keyboard control to play, pause, and loop playback so your hands never have to leave the keyboard. It allows fine tuning of the length of the loop and amount of overlap between loops, to fit with your own typing speeds and need for repetition (or lack thereof).

It also lets you choose exactly what you will and will not transcribe. Maybe you do want the "ums" and "uhs" and noting all the nuances in the speaker's tone of voice (e.g. <whisper>) in your transcription, as it gives valuable insight into the speaker's frame of mind and possile emotional state while speaking.

There's one more advantage to doing transcriptions yourself: you'll be much more familiar with your data after spending the time listening to your audio files while you transcribe.

Those aren't advantages you can get from an automatic transcription.