Realtime captioning for remote accessibility with Tolkchat


Tolkchat is a realtime remote captioning delivery system, designed to assist interpreters in delivering a reliable transcript to their clients.


For those in the deaf and hard-of-hearing community isolation can seem more extreme. With the dramatic rise in remote communication, significant work needs to be done to adjust and build a new culture of inclusivity.

As many companies have moved to remote working set ups, support systems which many have worked hard to put in place to assist hard of hearing staff have also broken down. Many hard of hearing people rely on lip reading, recognising speech patterns and other peer practices to build effective connections. Now they are having to adjust the way they communicate as they enter fully virtual operation. With the adoption of video conferencing, lags and system crashes, multiple speakers and indirect speech all present a challenge.

Realtime captioning for accessibility

Captioning software which automates speech-to-text processing (known as Automatic Speech Recognition or ASR) is becoming more popular, and advances in the field make it an increasingly effective option for tv broadcasters, press briefings, conferences and even internal meetings.

There are however still significant advantages to using a human captioner to transcribe conversations, talks and announcements. In this method, a transcriber works on site or remotely to capture the presenter’s speech in real time. Though AI-based systems produce text very quickly, they are not yet suitable for mainstream broadcast. The accuracy tends to range from 70-90% depending on the quality of the audio. Particularly when it comes to slang and highly technical or creative content, software does not have the breadth of vocabulary which a human transcriber can offer.

Research suggests that on average, human transcription has an accuracy rate of 96%. ASR is yet to match this quality, and its rating can decrease depending on the content of the speech being processed. In the technology sector in particular, in-person transcription by technical people is favoured greatly, especially at conferences and in transactional meetings, due to the professional expertise they are able to apply.

But with social distancing regulations in place across the globe, how can in-person transcription compete with the convenience of ASR?

Reliable remote captioning through in-person transcribers

Whether incoming news from media outlets, or changes to company practices due to the restrictions of working from home, remote captioning is a service which offers a solution to this dilemma.

Remote captioning involves a speech-to-text captioner listening in to what is being said through a telephone, video call system or other, and delivering a stream of text to a device to allow a deaf or hard of hearing person to actively engage. This also allows all parties to comply with relevant social distancing guidelines. Ensuring that this happens in realtime is essential to making conversation accessible.

The problem with remote captioning, where a captioner is unable to be physically present at an event or meeting and instead is using remote desktop software to deliver the text, is that if the connection is even a bit unstable the client will miss information—and they have to continuously be looking at the screen, which isn’t comfortable.

Tolkchat is a remote captioning delivery system designed to assist interpreters in delivering a reliable transcript to their clients. In Tolkchat, the audio channel is separated from the text channel. The captioner can listen in on the audio channel using any preferred method, even something as simple as a phone call. The client can then use Tolkchat on any device to read along with anything that the captioner writes down, in realtime.

Using Pusher Channels, the last 5000 characters of the text are continuously sent to the client. Even if the connection drops, is unstable, or otherwise is not entirely adequate for a full video stream, the client is still able to read along with the entirety of the text.

“Developing Tolkchat, we set our goal to deliver messages to the readers in the shortest time frame possible so they are completely up to date with everything that is being spoken. We initially started out with a polling setup, which quickly turned out to be a heavy burden on our servers, not to mention it was slow and unreliable,” says David van Der Staak, creator of Tolkchat.

“When we started using Pusher, the load on our servers dropped dramatically and the round-trip time was cut by over 70%.”

By transitioning to a WebSocket-based solution, Tolkchat dramatically reduced the burden placed on their servers by polling. Check out how moving to Pusher Channels could offer more reliability for your project.