Key takeaways from an Automatic Speech Recognition (ASR) pilot


Automatic Speech Recognition (ASR) is when computer software recognises the human voice. Recently, a pilot was conducted where ASR was implemented into certain courses. Learning Designer, John Murphy was involved with this pilot and interviewed Dr. Kencana Dharmapatni who was teaching a course included in the pilot, hear what she has to say.


Why did you choose to participate in the ASR pilot?

I was interested in exploring the Automatic Speech Recognition tool in Echo 360 recorded lectures as a way to help students to better access and understand course content, especially students with English as an additional language and those with hearing impairments. I thought a transcript – as long as it is accurate – had the potential to make lecture content more accessible by supplementing the audio recording, lecture slides and physical presence of the lecturer in a room. ASR can also increase flexibility, as students could listen and/or read later even while travelling or even without earphones.

What benefits did you discover for students using this feature?

I understand that the student survey was positive overall about its usefulness to support learning in participating courses and that most would like to see transcripts in more of their courses. I think that students would find it useful to check and confirm understanding, if they have not understood or have misheard something, – as long as it is transcribed correctly. In the transcripts that I have seen, I found that the accuracy was high at 95-98% which was impressive.

Are there any benefits for staff using this feature?

I think transcripts allow content to be made available in multiple formats which supports accessibility in teaching and learning.

I think transcripts can also be a useful self-evaluation tool for staff to optimise their pace and delivery of a lecture. Well-paced and clear delivery with key words stressed can impact on the accuracy of the transcript.

What limitations did you discover during the pilot?

Though the accuracy was very high overall, the tool did miss some subject-specific terminology. Accuracy is also reduced if key words are spoken quickly or rushed, like towards the end of a lecture. I understand from the staff and student survey results that accuracy can also depend on subject area, with transcripts of lectures involving maths being far less accurate which is a concern. The tool cannot differentiate well between words and numbers. I would be concerned about inaccurate transcripts being released to students as they could be misleading and confusing.

The pilot of automatic speech recognition (ASR) in Echo 360 did not look at editing the auto-generated captions. As there is currently no editing function within the Echo 360 ASR tool itself, it must currently be done outside of the system, so it is not practical or intuitive and would have workload implications, if academics had to check and edit transcripts. I look forward to future versions where editing is more intuitive.

Echo 360 ASR does not transcribe in real time. It takes approximately one hour to provide a transcript for a one hour lecture, which is still reasonable.

What advice or recommendations would you pass on to other lecturers wanting to engage with this feature?

I would advise lecturers who are considering using auto-transcripts to first consider if their subject area is suitable. They should also consider if there is a need. Do they have a large cohort of students with English as additional Language (EAL) or those with hearing impairments or other disabilities who may benefit from a transcript?

I would also advise students that the transcripts are not 100% accurate, and to check the lecture slides or voice recording if in doubt about the accuracy of certain words or phrases in the transcript.

I would also advise teaching staff to practice with Echo 360 personal capture at your desktop, and experiment with pace, stress of key words, as this will help generate a more accurate transcript.

If lecturers are interested in providing edited captions, there are other tools that I would recommend. For short key concept videos (5-10m), I am exploring Echo 360 Personal Capture and Studio (formerly ARC) in MyUni. With these tools, you can record video at your desktop, upload to your course and auto- generate, edit and publish captions within MyUni. I think for short videos the workload is more manageable and sustainable in online courses. Edited captions also support diversity, inclusion and accessibility.

If you are interested in learning more about any of these tools, or arranging a workshop on best practice, I’d recommend you contact a Learning Designer for your Faculty, or for technical assistance, you can call On 8313 3000.

Tagged in Learning Enhancement & Innovation, learning design, Educational Technology