🎉 New! Web Push Notifications for Chatkit. Learn more in our latest blog post.
Hide
Products
chatkit_full-logo

Extensible API for in-app chat

channels_full-logo

Build scalable realtime features

beams_full-logo

Programmatic push notifications

Developers

Docs

Read the docs to learn how to use our products

Tutorials

Explore our tutorials to build apps with Pusher products

Support

Reach out to our support team for help and advice

Sign in
Sign up

Creating a chat app with multi-lingual speech to text capability in React Native

  • Wern Ancheta

July 9th, 2019
You will need to have Node 11.2+, Yarn 1.13+, React Native CLI 2+ and React Native 0.59+ installed on your machine.

In this tutorial, we will be creating a chat app with native speech to text capability. Aside from that, the app will also be able to translate text from one language to another.

Speech to text feature is nothing new. If your phone has a built in speech to text feature, you’ll most likely find a microphone icon on the on-screen keyboard which will allow you to convert your speech to text. But what if we pair this feature with text translation? It can be very beneficial to users of your chat application as they would be able to speak in their own language and the recipient will receive it in their own language.

Prerequisites

Basic knowledge of Node.js and React Native is required to follow this tutorial. We will use Node.js for the server and React Native to create the app.

Aside from that, we will also be using Chatkit so you should know how to use it as well. Be sure to create an account if you don’t already have one. After that, create a corresponding Chatkit app and enable the test token provider.

For implementing text translation, you’ll need a Microsoft Azure account. Simply search “Azure sign up” or go to this page to sign up.

We will use ngrok for exposing the server to the internet.

The following package versions will be used:

  • Node 11.2.0
  • Yarn 1.13.0
  • React Native CLI 2.0.1
  • React Native 0.59.9

Be sure to use the above versions if you encounter any issues with running the app.

App overview

We will be creating a multi-lingual chat app which has a speech to text feature. This way, the users can just dictate what they want to say and the corresponding text will automatically be added to the chat screen. One benefit of this over using the built-in speech to text functionality is that you’ll have control over the resulting text. This allows you to manipulate it however you want before actually presenting it to the user.

The app is also multi-lingual because the user can choose their default language when they log in. The language they selected will be the base language used for picking up speech and receiving messages. For example, if the current user’s language is English but the person they’re chatting with is Spanish, the current user will actually receive English instead of Spanish. The same is true with the other person.

Here’s what the app will look like:

You can find the source code on this GitHub repo.

What is Cognitive Services?

Before we proceed, let's first quickly go over what Cognitive Services is. Cognitive Services is a collection of services that allows developers to easily implement machine learning features to their applications. These services are available via an API which are grouped under the following categories:

  • Vision - for analyzing images and videos.
  • Speech - for converting speech to text and vise-versa.
  • Language - for processing natural language.
  • Decision - for content moderation.
  • Search - for implementing search algorithms that are used on Bing.

Today, we’re only concerned about the Language APIs. More specifically the Translator Text API. This is used for translating text from one language to another, identify language used in a text, and get alternate translations and usage for dictionary words.

Setting up Cognitive Services

In this section, we’ll be setting up Cognitive services in the Azure portal. This section assumes that you already have an Azure account.

First, go to the Azure portal and search for “Cognitive services”. Click on the first result under the Services:

Once you’re there, click on the Add button. This will lead you to the page where you can search for the specific cognitive service you want to use:

Next, search for Translator Text and click the first option that shows up:

On the page that follows, click on the Create button to add the service:

After that, it will ask for the details of the service you want to create. Enter the following details:

  • Name: speak-chat
  • Subscription: Pay-As-You-Go
  • Pricing tier: F0 (this is within the free range so you won’t actually get charged)
  • Resource group: click on Create new

Enter the details of the resource group you want to add the service to. In this case, I simply put in the name then clicked OK:

Once the resource group is created, you can now add the cognitive service.

Here’s what it looks like once created. Click on the Show access keys link to see the API keys that you can use to make requests to the API. At the bottom, you can also see the number of API calls that you have made and the total allotted to the pricing tier you chose:

Bootstrapping the app

So that we could focus on the app’s main features. I’ve created a starter project which already contains the basic chat app features. The app already has routing with React Navigation, and basic chat UI using React Native Gifted Chat. Go ahead and clone it and install the dependencies:

    git clone https://github.com/anchetaWern/RNSpeakChat.git
    cd RNSpeakChat
    yarn
    react-native eject
    react-native link @react-native-community/async-storage
    react-native link react-native-config
    react-native link react-native-gesture-handler
    react-native link react-native-vector-icons 
    react-native link react-native-voice

React Native Config has an additional configuration. Follow the instructions on their GitHub page:

Building the app

Now we’re ready to build the app. Before we proceed, here are a few links which you can use as a reference while going through the tutorial:

Config

Let’s first start by adding the config for both Chatkit and Cognitive Services:

    // .env
    CHATKIT_INSTANCE_LOCATOR_ID="YOUR CHATKIT INSTANCE LOCATOR ID"
    CHATKIT_SECRET_KEY="YOUR CHATKIT SECRET KEY"
    CHATKIT_TOKEN_PROVIDER_ENDPOINT="YOUR CHATKIT TOKEN PROVIDER"
    COGNITIVE_SERVICES_API_KEY="YOUR TRANSLATOR TEXT API KEY"

Do the same for the server as well:

    // server/.env
    CHATKIT_INSTANCE_LOCATOR_ID="YOUR CHATKIT INSTANCE LOCATOR ID"
    CHATKIT_SECRET_KEY="YOUR CHATKIT SECRET KEY"

Next, define the locales that are available on the React Native Voice package. These values came from this GitHub issue:

    // src/config/locales.js
    const locales = ["ar-EG", "de-DE", "en-US", "es-CL", "fr-FR", "hi", "it-IT", "ja", "ko", "pt-BR", "ru", "zh-CN"];
    const short_locales = locales.map(item => item.split('-')[0]);

    export default { 'long': locales, 'short': short_locales };

Next, create a src/config/base_instance_opt.js file and add the following. This contains the base configuration for making requests to the Translator Text API:

    // src/config/base_instance_opt.js
    import Config from 'react-native-config';
    const COGNITIVE_SERVICES_API_KEY = Config.COGNITIVE_SERVICES_API_KEY;

    const base_instance_opt = {
      baseURL: `https://api.cognitive.microsofttranslator.com`,
      timeout: 10000,
      headers: {
        'Content-Type': 'application/json',
        'Ocp-Apim-Subscription-Key': COGNITIVE_SERVICES_API_KEY
      }
    };

    export default base_instance_opt;

Login screen

Open the src/screens/Login.js file and include the packages that we’ll need:

    import { View, Text, TextInput, Button, StyleSheet, Picker } from "react-native"; // add Picker
    // add these:
    import axios from 'axios';
    import AsyncStorage from '@react-native-community/async-storage'; // for storing the languages to choose from

Include the base instance options and locales file as well:

    import base_instance_opt from '../config/base_instance_opt';
    import locales from '../config/locales';

Next, update the initial state to include the new state data that we’ll be working with:

    class Login extends Component {
      // ...

      state = {
        username: "",
        is_loading: false,
        // add these:
        languages: [], // array of languages to choose from (used in the picker)
        language: 'en' // the default language to use for the picker
      };
    }

When the component is mounted, we request for the languages from the local storage. If it’s not available, we make a request to the Translator Text API using the base configuration from the base_instance_opt.js file. We request for the languages available for translation on the /languages endpoint. This endpoint requires you to pass the api-version as a query parameter. The scope is optional because it’s only used to specify which data to return. The endpoint will simply return everything if it’s not specified:

    async componentDidMount() {
      try {
        const stored_languages = await AsyncStorage.getItem('languages');

        if (!stored_languages) {
          const languages_opt = { ...base_instance_opt };
          const languages_instance = axios.create(languages_opt);
          const languages_res = await languages_instance.get('/languages?api-version=3.0&scope=translation');

          // next: add code for storing the languages locally
        }

        // last: add code for updating the state with the languages data

      } catch (err) {
        console.log("error occured: ", err);
      }
    }

Next, add the code for storing the languages locally. First we extract the keys (for example: fr, en, es) from the list of available languages. We also extract the values (name, nativeName, dir) and then use map() to extract only the nativeName. After that, we iterate through the available languages and make sure to only include the one’s that are also available to React Native Voice:

    const lang_keys = Object.keys(languages_res.data.translation);
    const lang_values = Object.values(languages_res.data.translation).map((x) => x.nativeName);

    var fetched_languages = [];
    lang_keys.forEach((key, i) => {
      if (locales.short.indexOf(key) !== -1) { // only include the languages that are also available to React Native Voice
        fetched_languages.push({
          key, // for example: en, es, fr
          val: lang_values[i] // native name of the language
        });
      }
    });

    await AsyncStorage.setItem('languages', JSON.stringify(fetched_languages)); // store the data locally

Lastly, add the code for updating the state with the languages data:

    const languages = (stored_languages) ? JSON.parse(stored_languages) : fetched_languages;
    await this.setState({
      languages
    });

In case you’re wondering what the response of the /languages endpoint looks like:

    {
      "translation": {
        "fr": {
          "name": "French",
          "nativeName": "Français",
          "dir": "ltr"
        },
        "en": {
          // ...
        },
        "es": {
          // ...
        }
      }
    }

Next, update the render() method to include the language picker right above the username:

    render() {
      return (
        <View style={styles.wrapper}>
          <View style={styles.container}>
            <View style={styles.main}>
              <View style={styles.fieldContainer}>
                <Text style={styles.label}>Enter your language</Text>
                <Picker
                  selectedValue={this.state.language}
                  style={styles.picker}
                  onValueChange={(itemValue, itemIndex) =>
                    this.setState({language: itemValue})
                  }>
                  {this.renderLanguages()}
                </Picker>
              </View>

              <View style={styles.fieldContainer}>
                <Text style={styles.label}>Enter your username</Text>
                <TextInput
                  style={styles.textInput}
                  onChangeText={username => this.setState({ username })}
                  value={this.state.username}
                />
              </View>

              {!this.state.is_loading && (
                <Button title="Login" color="#0064e1" onPress={this.login} />
              )}

              {this.state.is_loading && (
                <Text style={styles.loadingText}>Loading...</Text>
              )}
            </View>
          </View>
        </View>
      );
    }

Here’s the code for rendering the languages:

    renderLanguages = () => {
      return this.state.languages.map((lang) => {
        return <Picker.Item label={lang.val} value={lang.key} key={lang.key} />
      });
    }

Lastly, update the login() code to include the selected language:

    login = async () => {
      const { language, username } = this.state; // add language

      this.setState({
        is_loading: true
      });

      if (username) {
        this.props.navigation.navigate("Rooms", {
          'language': language, // add this
          'id': username
        });
      }

      await this.setState({
        is_loading: false,
        username: "",
        language: "en" // add this
      });
    }

Rooms screen

The rooms screen is where all the rooms that the current user is a member of and the public rooms that they can join are listed. We don’t really need to do a lot here because all the functionality required for this page has already been implemented. All we really need to do is pass in the selected language to the Chat screen via a navigation param:

    // src/screens/Rooms.js
    constructor(props) {
      super(props);
      const { navigation } = this.props;
      this.user_id = navigation.getParam("id");
      this.language = navigation.getParam("language"); // add this
    }

On the goToChatScreen() function, pass the language as a navigation param:

    goToChatScreen = (room) => {
      this.props.navigation.navigate("Chat", {
        user_id: this.user_id,
        room_id: room.id,
        room_name: room.name,
        language: this.language // add this
      });
    }

Chat screen

We now proceed to the main meat of the app, the Chat screen. As mentioned earlier, all the basic chat features has already been implemented. All we have to do is add the speech to text and language translation feature.

Let’s start by importing the packages we’ll need:

    // src/screens/Chat.js
    import { View, TouchableOpacity, StyleSheet } from 'react-native'; // add TouchableOpacity, StyleSheet
    import axios from 'axios'; // for making requests to the Translation Text API
    import Icon from 'react-native-vector-icons/FontAwesome'; // for the microphone icon
    import Voice from 'react-native-voice'; // for the speech to text feature

Again, we’ll need the base instance and locales file:

    import base_instance_opt from '../config/base_instance_opt';
    import locales from '../config/locales';

Next, update the default state to include is_listening. This will be responsible for keeping track of whether the app is still listening for the user’s speech or not:

    class Chat extends Component {

      state = {
        text: '',
        messages: [],

        is_listening: false // add this (used for determining the color of the microphone icon)
      };

      // ...
    }

Next, update the constructor() to extract the current language from the navigation params and attach the listeners for the React Native Voice module:

    constructor(props) {
      super(props);
      const { navigation } = this.props;

      this.user_id = navigation.getParam("user_id");
      this.room_id = navigation.getParam("room_id");

      // add these:
      this.language = navigation.getParam("language"); // example: es
      const locale_index = locales.short.indexOf(this.language);
      this.voice_locale = locales.long[locale_index]; // example: es-CL

      Voice.onSpeechError = this.onSpeechError; // attach listener for when error occurs while trying to listen to speech
      Voice.onSpeechResults = this.onSpeechResults; // listener for when speech to text is done with its job
    }

Next, inside the render() method, add the text, onInputTextChanged, and renderActions props to GiftedChat:

  • text - for specifying the value of the text field for entering messages.
  • onInputTextChanged - for specifying the function to execute when the text in the text field changes. The text isn’t really specified by default which means that Gifted Chat will manage its state internally. Since we’re now specifying it, we need a way to update its contents when the user types something. We still want the user to be able to use the chat app like usual (by typing). But that won’t be possible if onInputTextChanged isn’t passed as a prop. If you type something on the text field, the text you typed won’t show up if this prop isn’t specified.
  • renderActions - for specifying the function that will return the UI to render to the left of the text field for entering messages. In our case, we want to render a button with a microphone icon inside. This will be used for triggering the speech listener.

Here’s the code:

    render() {
      const { text, messages } = this.state;
      return (
        <View style={{flex: 1}}>

          <GiftedChat
            text={text}
            onInputTextChanged={text => this.setCustomText(text)}
            messages={messages}
            onSend={messages => this.onSend(messages)}
            user={{
              _id: this.user_id
            }}
            renderActions={this.renderCustomActions}
          />
        </View>
      );
    }

Here’s the setCustomText() function. As you can see, all it does is update the text in the state with the current text value inputted in the text field:

    setCustomText = (text) => {
      this.setState({
        text
      });
    }

Here’s the renderCustomAction() function. This renders the button with the microphone icon if the speech recognition service is available in the system. The color changes depending on the value of is_listening. When the button is pressed, the listen() function will be executed:

    renderCustomActions = () => {
      const { is_listening } = this.state;
      const color = is_listening ? '#e82020' : '#333';
      if (Voice.isAvailable()) { // check if speech recognition service is available on the system
       return (
          <View style={styles.customActionsContainer}>
            <TouchableOpacity onPress={this.listen}>
              <View style={styles.buttonContainer}>
                <Icon name="microphone" size={23} color={color} />
              </View>
            </TouchableOpacity>
          </View>
        );
      }
      return null;
    }

Here’s the listen() function. To start listening use the Voice.start() function. This accepts the locale that the listener will be biased to. Based on my testing, even if I select Spanish as my language, it will still be able to recognize English words:

    listen = async () => {
      this.setState({
        is_listening: true
      });

      try {
        await Voice.start(this.voice_locale); 
      } catch (e) {
        console.error(e);
      }
    }

Next, add the codes for the listener. When the onSpeechResults() function is executed, an argument is automatically passed to it. This argument contains the text representation of the speech. Note that e.value contains an array of the possible values. We simply use the first one as that seems to be the most accurate:

    onSpeechError = e => {
      console.log('onSpeechError: ', e);
    };

    onSpeechResults = e => {
      console.log('results: ', e);
      this.setState({
        is_listening: false,
        text: e.value[0],
      });
    };

Next, we will now implement the multi-lingual feature. Update the getMessage() function with the following. First, we construct the data required by the Translator Text API. This is a JSON string containing an array of objects with the Text property. Text is the text you want to translate. We only want to translate one so we only put a single object on the array. We then make a POST request to the /translate endpoint. The api-version and the language to translate to (to) should be passed as a query parameter. The response that comes back is an array of results. We only passed in a single object so we can simply extract the first index. From there, we just extract the first item under the translations and get its text property. This contains the translated version of the text. We then use it as the new value for the chat bubble:

    getMessage = async ({ id, sender, parts, createdAt }) => {
      const text = parts.find(part => part.partType === 'inline').payload.content;
      let txt = text; // add this

      // add these
      try {
        const translate_opt = { ...base_instance_opt };
        const translate_instance = axios.create(translate_opt);

        const content = JSON.stringify([{
          'Text': text
        }]);
        const res = await translate_instance.post(`https://api.cognitive.microsofttranslator.com/translate?api-version=3.0&to=${this.language}`, content);
        txt = res.data[0].translations[0].text;

      } catch (err) {
        console.log("err: ", err);
      }

      const msg_data = {
        _id: id,
        text: txt, // update this
        createdAt: new Date(createdAt),
        user: {
          _id: sender.id,
          name: sender.name,
          avatar: `https://ui-avatars.com/api/?background=d88413&color=FFF&name=${sender.name}`
        }
      };

      return {
        message: msg_data
      };
    }

Before the component unmounts, clean up by removing all of the listeners we attach earlier on the constructor():

    componentWillUnMount() {
      this.currentUser.disconnect();
      Voice.destroy().then(Voice.removeAllListeners); // add this
    }

Lastly, add the styles:

    const styles = StyleSheet.create({
      customActionsContainer: {
        flexDirection: "row",
        justifyContent: "space-between"
      },
      buttonContainer: {
        padding: 10
      }
    });

Running the app

At this point, you’re now ready to run the app. First run the server:

    cd server
    node server.js
    ./ngrok http 5000

Update the CHAT_SERVER URL on the src/screens/Login.js file:

    const CHAT_SERVER = "YOUR NGROK HTTPS URL";

Go to your Chatkit dashboard and create a few users which you can use for logging in. After that, create a room as well. Make sure it’s a public room, otherwise it won’t be listed in the Rooms screen.

Finally, run the app:

    react-native run-android
    react-native run-ios

Select the language and enter the user ID of the Chatkit user you created for the username. After that, select the room. On the chat room, you can dictate what you want to say by tapping on the microphone icon. You will hear a sound and the microphone will turn red to indicate that it has started listening. Once you stop speaking the recording will also stop and what you just said will appear on the text field for entering the message.

Conclusion

That’s it! In this tutorial, you learned how to create a chat app that can convert speech to text and then translate it to the user’s preferred language. Specifically, you learned how to use the React Native Voice module for converting human speech to a readable text. You also learned how to use Translator Text API from Microsoft Cognitive Services to translate text from one language to another.

You can find the source code on this GitHub repo.

Clone the project repository
  • Chat
  • JavaScript
  • Node.js
  • React Native
  • Chatkit

Products

  • Channels
  • Chatkit
  • Beams

© 2019 Pusher Ltd. All rights reserved.

Pusher Limited is a company registered in England and Wales (No. 07489873) whose registered office is at 160 Old Street, London, EC1V 9BW.