How to create a WebRTC video call app with Node.js

Introduction

Pusher is perfect for instantaneously distributing messages amongst people and devices. This is exactly why Pusher is a great choice for signaling in WebRTC, the act of introducing two devices in realtime so they can make their own peer-to-peer connection.

WebRTC (Web Real-Time Communications) is a technology which enables web applications and sites to capture and optionally stream audio and/or video media, and to exchange arbitrary data between browsers without requiring an intermediary. The set of standards that comprises WebRTC makes it possible to share data and perform teleconferencing peer-to-peer, without requiring that the user install plug-ins or any other third-party software.

In this tutorial, we will build a video call app that allows you to make calls, accept and also reject calls. Making your own video call application using WebRTC is simple thanks to the Pusher API.

webrtc-video-call-preview

Prerequisites

A basic understanding of Node.js and client-side JavaScript is required for this tutorial.

Setting up a Pusher account and app

Pusher Channels is a hosted service that makes it super-easy to add realtime data and functionality to web and mobile applications. Create a free sandbox Pusher account or sign in.

Pusher acts as a realtime layer between your servers and clients. Pusher maintains persistent connections to the clients - over Web-socket if possible and falling back to HTTP-based connectivity - so that as soon as your servers have new data they want to push to the clients they can do, via Pusher.

We will register a new app on the Pusher dashboard. The only compulsory options are the app name and cluster. A cluster represents the physical location of the Pusher server that will handle your app’s requests. Also, copy out your App ID, Key, and Secret from the App Keys section, as we will need them later on.

Setting up the project

Let’s create a new node project by running:

1#create directory
2    mkdir pusher-webrtc
3    #move into the new directory
4    cd puhser-webrtc
5    #initialize a node project
6    npm init -y

Next, let’s move ahead by installing the required libraries:

    npm install body-parser express pusher --save

In the command above, we have installed three libraries which are:

  • Express: fast, unopiniated, minimalistic web framework for Node.js.
  • Body-parser: parse incoming request bodies in a middleware before your handlers, available under the req.body property.
  • Pusher: the official Node.js library for Pusher.

Setting up the entry point

Create a file called index.js in the root folder and paste in:

1const express = require('express');
2    const bodyParser = require('body-parser');
3    const Pusher = require('pusher');
4    const app = express();
5    
6    
7    // Body parser middleware
8    app.use(bodyParser.json());
9    app.use(bodyParser.urlencoded({ extended: true }));
10    // Session middleware
11    
12    // Create an instance of Pusher
13    const pusher = new Pusher({
14        appId: 'XXX-API-ID',
15        key: 'XXX-API-KEY',
16        secret: 'XXX-API-SECRET',
17        cluster: 'XXX-API-CLUSTER',
18        encrypted: true
19    });
20    
21    app.get('/', (req, res) => {
22        return res.sendFile(__dirname + '/index.html');
23    });
24    
25    //listen on the app
26    app.listen(3000, () => {
27        return console.log('Server is up on 3000')
28    });

In the code block above, we have added the required libraries, used the body-parser middleware, and started an instance of Pusher, passing in the app id, key, secret, and cluster.

Next, we defined the base route, in which we serve an index.html file (which we will create later on).

Finally, we set the app to listen on port 3000.

Setting up the authentication route

Since we are building a video call app, it will be nice to know who’s online at the moment. Pusher’s presence channels keeps a record of members online. We will use presence channels as opposed to the usual public channels.

Pusher’s presence channel subscriptions must be authenticated. Hence, we will have an authentication route. Add the route below to your index.js file:

1// get authentictation for the channel;
2    app.post("/pusher/auth", (req, res) => {
3      const socketId = req.body.socket_id;
4      const channel = req.body.channel_name;
5      var presenceData = {
6        user_id:
7          Math.random()
8            .toString(36)
9            .slice(2) + Date.now()
10      };
11      const auth = pusher.authenticate(socketId, channel, presenceData);
12      res.send(auth);
13    });

In the code above, we defined a new route at /pusher/auth which uses the usual pusher.authenticate method, but with an additional parameter which holds the details of the user trying to access the channel. This parameter is expected to be an object with two keys which are: user_id and user_info. The user_info key is however optional.

Note: In the example above, I am just passing a random unique id to each user. In a real-world application, you might need to pass in the user id from the database or other authentication methods as used in your app.

Creating the

Remember while we were creating the entry point, we served a file called index.html in the base route, which we were yet to create? Next, we will create a new file called index.html in the root folder, and add:

1<!DOCTYPE html>
2    <html>
3    
4    <head>
5        <title>WebRTC Audio/Video-Chat</title>
6    </head>
7    
8    <body>
9        <div id="app">
10            <span id="myid"> </span>
11            <video id="selfview"></video>
12            <video id="remoteview"></video>
13            <button id="endCall" style="display: none;" onclick="endCurrentCall()">End Call </button>
14            <div id="list">
15                <ul id="users">
16    
17                </ul>
18            </div>
19        </div>
20    </body>
21    
22    </html>

In the code block above, we have a basic HTML setup with one span element which holds the ID of the current user, two video elements for both the caller and the receiver, a button to end the current call, with an onclick attribute if endCurrentCall() which we will define soon, and finally an ul element which holds the list of all users.

Displaying online users

To make video calls, we need to be able to see online users, which was the reason we opted for presence channels. Just before the body closing tag, paste in:

1<script src="https://js.pusher.com/4.1/pusher.min.js"></script>
2    <script>
3    var pusher = new Pusher("XXX-API-KEY", {
4      cluster: "XXX-API-CLUSTER",
5      encrypted: true,
6      authEndpoint: "pusher/auth"
7    });
8    var usersOnline,
9      id,
10      users = [],
11      sessionDesc,
12      currentcaller,
13      room,
14      caller,
15      localUserMedia;
16    const channel = pusher.subscribe("presence-videocall");
17    
18    channel.bind("pusher:subscription_succeeded", members => {
19      //set the member count
20      usersOnline = members.count;
21      id = channel.members.me.id;
22      document.getElementById("myid").innerHTML = ` My caller id is : ` + id;
23      members.each(member => {
24        if (member.id != channel.members.me.id) {
25          users.push(member.id);
26        }
27      });
28    
29      render();
30    });
31    
32    channel.bind("pusher:member_added", member => {
33      users.push(member.id);
34      render();
35    });
36    
37    channel.bind("pusher:member_removed", member => {
38      // for remove member from list:
39      var index = users.indexOf(member.id);
40      users.splice(index, 1);
41      if (member.id == room) {
42        endCall();
43      }
44      render();
45    });
46    
47    function render() {
48      var list = "";
49      users.forEach(function(user) {
50        list +=
51          `<li>` +
52          user +
53          ` <input type="button" style="float:right;"  value="Call" onclick="callUser('` +
54          user +
55          `')" id="makeCall" /></li>`;
56      });
57      document.getElementById("users").innerHTML = list;
58    }
59    </script>

Here, we have required the official client library for Pusher. Next, we start a new Pusher instance, passing in our app key, and also the authentication route we had created earlier.

We go on to define initial variables which we will use in the code:

  • usersOnline: the count of users online
  • id: the ID of the current user
  • users: an array that holds the details of all users
  • sessionDesc: the SDP offer being sent. SDP refers to the session description of the peer connection provided by WebRTC. (You would see more of this as we move on)
  • room: the identifier of the current people having a call.
  • caller: the peer connection object of the person calling/receiving a call.
  • localUserMedia: a reference to the local audio and video stream being transmitted from the caller.

Next, we subscribe to a presence channel called presence-videocall. Once subscribed to our channel, it triggers an authentication, which returns an object. To access this object, we have to bind to the pusher:subscription_succeeded event. We then get the users count, the user id, append all members apart from the current user to the user’s array. We then call a render function. (The render function would be to display the online users. We will create this function soon).

Also, we bind to two more events which are: pusher:member_added and pusher:member_removed in which we add new members and delete logged out members from the array respectively.

Finally, we define the render function which loops through all users and then appends them to the ul element as li tags with call buttons which have an onclick attribute of callUser which we will create soon.

Integrating WebRTC into the app

Now we are all set, we can use Pusher to take care of signaling within the video call. First, let’s get the video call started. Paste the following after the render function in the index.html file:

1//To iron over browser implementation anomalies like prefixes
2    GetRTCPeerConnection();
3    GetRTCSessionDescription();
4    GetRTCIceCandidate();
5    //prepare the caller to use peerconnection
6    prepareCaller();
7    function GetRTCIceCandidate() {
8      window.RTCIceCandidate =
9        window.RTCIceCandidate ||
10        window.webkitRTCIceCandidate ||
11        window.mozRTCIceCandidate ||
12        window.msRTCIceCandidate;
13    
14      return window.RTCIceCandidate;
15    }
16    
17    function GetRTCPeerConnection() {
18      window.RTCPeerConnection =
19        window.RTCPeerConnection ||
20        window.webkitRTCPeerConnection ||
21        window.mozRTCPeerConnection ||
22        window.msRTCPeerConnection;
23      return window.RTCPeerConnection;
24    }
25    
26    function GetRTCSessionDescription() {
27      window.RTCSessionDescription =
28        window.RTCSessionDescription ||
29        window.webkitRTCSessionDescription ||
30        window.mozRTCSessionDescription ||
31        window.msRTCSessionDescription;
32      return window.RTCSessionDescription;
33    }
34    function prepareCaller() {
35      //Initializing a peer connection
36      caller = new window.RTCPeerConnection();
37      //Listen for ICE Candidates and send them to remote peers
38      caller.onicecandidate = function(evt) {
39        if (!evt.candidate) return;
40        console.log("onicecandidate called");
41        onIceCandidate(caller, evt);
42      };
43      //onaddstream handler to receive remote feed and show in remoteview video element
44      caller.onaddstream = function(evt) {
45        console.log("onaddstream called");
46        if (window.URL) {
47          document.getElementById("remoteview").src = window.URL.createObjectURL(
48            evt.stream
49          );
50        } else {
51          document.getElementById("remoteview").src = evt.stream;
52        }
53      };
54    }

In the code block above, we called functions which we defined just after calling them. The first three functions GetRTCPeerConnection(), GetRTCSessionDescription() and GetRTCIceCandidate() are used to iron out browser implementation anomalies for RTCPeerConnection, RTCSessionDescription and such as web-kit or Mozilla Gecko browsers. You may wonder what are they?

The RTCPeerConnection interface represents a WebRTC connection between the local computer and a remote peer. It provides methods to connect to a remote peer, maintain and monitor the connection, and close the connection once it's no longer needed.

The RTCSessionDescription interface describes one end of a connection or potential connection and how it's configured. Each RTCSessionDescription comprises a description type indicating which part of the offer/answer negotiation process it describes and of the SDP descriptor of the session.

The RTCIceCandidate interface is part of the WebRTC API which represents a candidate Internet Connectivity Establishment (ICE) server which may establish an RTCPeerConnection.

Remember we also called the prepareCaller function? So what is it about? This function sets a new RTCPeerConnection instance to the predefined caller variable while assigning functions for its onicecandidate and onaddstream event. In the event of an icecandidate, we call the onIceCandidate function, which we will define soon, while in the event of a newly added stream, we set the URL of the stream to be the URL of our remote video. i.e this is the second party’s video.

Defining the onIceCandidate function and using the candidate

Let’s look at what our onIceCandidate function would look like. Paste the following into the script part of your index.html file:

1//Send the ICE Candidate to the remote peer
2    function onIceCandidate(peer, evt) {
3        if (evt.candidate) {
4            channel.trigger("client-candidate", {
5                "candidate": evt.candidate,
6                "room": room
7            });
8        }
9    }
10    
11    channel.bind("client-candidate", function(msg) {
12            if(msg.room==room){
13                console.log("candidate received");
14                caller.addIceCandidate(new RTCIceCandidate(msg.candidate));
15            }
16        });

In this function, we make a quick trigger to the other party, informing him that a new iceCandidate event has occurred. This function will be called whenever the local ICE agent needs to deliver a message to the other peer through the signaling server (In this case, Pusher). This lets the ICE agent perform negotiation with the remote peer without the browser itself needing to know any specifics about the technology being used for signaling; implement this method to use whatever messaging technology you choose to send the ICE candidate to the remote peer.

On the other end, we bind for the candidate and then add the IceCandidate to the current RTCPeerConnection

Calling a user

Calling a user using WebRTC is simple. First, we need to get the caller’s stream, then create an offer to the peer you are calling. Here, we use Pusher to signal the other peer that an incoming call is waiting for him.

In the code below, you notice we trigger client-events rather than making a post request to the server which triggers an event that we bound to.

The reason for this is because we need not store this information on the server. Unless you need to, I’ll recommend that you use client-events. However, for client-events to work, you need to have them enabled on your Pusher’s app dashboard. Paste the following in the script section of your index.html file:

1function getCam() {
2      //Get local audio/video feed and show it in selfview video element
3      return navigator.mediaDevices.getUserMedia({
4        video: true,
5        audio: true
6      });
7    }
8    //Create and send offer to remote peer on button click
9    function callUser(user) {
10      getCam()
11        .then(stream => {
12          if (window.URL) {
13            document.getElementById("selfview").src = window.URL.createObjectURL(
14              stream
15            );
16          } else {
17            document.getElementById("selfview").src = stream;
18          }
19          toggleEndCallButton();
20          caller.addStream(stream);
21          localUserMedia = stream;
22          caller.createOffer().then(function(desc) {
23            caller.setLocalDescription(new RTCSessionDescription(desc));
24            channel.trigger("client-sdp", {
25              sdp: desc,
26              room: user,
27              from: id
28            });
29            room = user;
30          });
31        })
32        .catch(error => {
33          console.log("an error occured", error);
34        });
35    }
36    function toggleEndCallButton() {
37      if (document.getElementById("endCall").style.display == "block") {
38        document.getElementById("endCall").style.display = "none";
39      } else {
40        document.getElementById("endCall").style.display = "block";
41      }
42    }

The code has been explained above. However, notice we have an extra function called toggleEndCallButton . This is used to toggle the end call button, so you can end an active call.

Also, note we triggered a client-event. This event uses Pusher to notify the recipient he has a call. Here, instead of generating a unique room ID for the two users, we use the recipient's ID as the room id. Please use any unique identifier for the room.

Receiving a call

Receiving a call is easy. First, the recipient needs to be notified that he has a call. Remember we emitted a client event earlier on while making the call? Now we need to bind and listen to it.

1channel.bind("client-sdp", function(msg) {
2        if(msg.room == id){
3            var answer = confirm("You have a call from: "+ msg.from + "Would you like to answer?");
4            if(!answer){
5                return channel.trigger("client-reject", {"room": msg.room, "rejected":id});
6            }
7            room = msg.room;
8            getCam()
9            .then(stream => {
10                localUserMedia = stream;
11                toggleEndCallButton();
12                if (window.URL) {
13                    document.getElementById("selfview").src = window.URL.createObjectURL(stream);
14                } else {
15                    document.getElementById("selfview").src = stream;
16                }
17                caller.addStream(stream);
18                var sessionDesc = new RTCSessionDescription(msg.sdp);
19                caller.setRemoteDescription(sessionDesc);
20                caller.createAnswer().then(function(sdp) {
21                    caller.setLocalDescription(new RTCSessionDescription(sdp));
22                    channel.trigger("client-answer", {
23                        "sdp": sdp,
24                        "room": room
25                    });
26                });
27    
28            })
29            .catch(error => {
30                console.log('an error occured', error);
31            })
32        }
33    });
34    channel.bind("client-answer", function(answer) {
35      if (answer.room == room) {
36        console.log("answer received");
37        caller.setRemoteDescription(new RTCSessionDescription(answer.sdp));
38      }
39    });
40    
41    channel.bind("client-reject", function(answer) {
42      if (answer.room == room) {
43        console.log("Call declined");
44        alert("call to " + answer.rejected + "was politely declined");
45        endCall();
46      }
47    });
48    
49    function endCall() {
50      room = undefined;
51      caller.close();
52      for (let track of localUserMedia.getTracks()) {
53        track.stop();
54      }
55      prepareCaller();
56      toggleEndCallButton();
57    }

In the code above, we bind to the client-sdp event which was emitted when we made the call. Next, we check that the room is equal to the ID of the receiver (remember we used the receiver’s ID as the room. This way it doesn’t alert the wrong person). We move ahead to present a confirm box, prompting the user to accept or reject the call. If the user rejects. we return a client trigger of client-reject, passing in the room’s call that was rejected.

If the call isn’t rejected, we get the recipient's webcam, then set the reference to the stream (so we can stop the webcam while ending the call). We add the stream to the video output and the current RTCPeerConnection instance.

We set the remote description as the description of the sdp sent by the caller. Finally, we create an answer and then send the answer to the caller.

If the answer is not received, the call would not be connected. When the answer is received, the caller then sets his remote description to the sdp from the receiver.

Notice the listener for the client-reject event calls the endCall function (this is because when a call is rejected, we want to end everything about the call). The end call function sets the room to its status quo, closes the , stops the media streaming, prepares the caller to make/receive new calls, then finally disable the end call button.

Wrapping it all up

At the end of the whole episode, here is what our JavaScript code looks like:

1var pusher = new Pusher("XXX-API-KEY", {
2      cluster: "mt1",
3      encrypted: true,
4      authEndpoint: "pusher/auth"
5    });
6    var usersOnline,
7      id,
8      users = [],
9      sessionDesc,
10      currentcaller,
11      room,
12      caller,
13      localUserMedia;
14    const channel = pusher.subscribe("presence-videocall");
15    
16    channel.bind("pusher:subscription_succeeded", members => {
17      //set the member count
18      usersOnline = members.count;
19      id = channel.members.me.id;
20      document.getElementById("myid").innerHTML = ` My caller id is : ` + id;
21      members.each(member => {
22        if (member.id != channel.members.me.id) {
23          users.push(member.id);
24        }
25      });
26    
27      render();
28    });
29    
30    channel.bind("pusher:member_added", member => {
31      users.push(member.id);
32      render();
33    });
34    
35    channel.bind("pusher:member_removed", member => {
36      // for remove member from list:
37      var index = users.indexOf(member.id);
38      users.splice(index, 1);
39      if (member.id == room) {
40        endCall();
41      }
42      render();
43    });
44    
45    function render() {
46      var list = "";
47      users.forEach(function(user) {
48        list +=
49          `<li>` +
50          user +
51          ` <input type="button" style="float:right;"  value="Call" onclick="callUser('` +
52          user +
53          `')" id="makeCall" /></li>`;
54      });
55      document.getElementById("users").innerHTML = list;
56    }
57    
58    //To iron over browser implementation anomalies like prefixes
59    GetRTCPeerConnection();
60    GetRTCSessionDescription();
61    GetRTCIceCandidate();
62    prepareCaller();
63    function prepareCaller() {
64      //Initializing a peer connection
65      caller = new window.RTCPeerConnection();
66      //Listen for ICE Candidates and send them to remote peers
67      caller.onicecandidate = function(evt) {
68        if (!evt.candidate) return;
69        console.log("onicecandidate called");
70        onIceCandidate(caller, evt);
71      };
72      //onaddstream handler to receive remote feed and show in remoteview video element
73      caller.onaddstream = function(evt) {
74        console.log("onaddstream called");
75        if (window.URL) {
76          document.getElementById("remoteview").src = window.URL.createObjectURL(
77            evt.stream
78          );
79        } else {
80          document.getElementById("remoteview").src = evt.stream;
81        }
82      };
83    }
84    function getCam() {
85      //Get local audio/video feed and show it in selfview video element
86      return navigator.mediaDevices.getUserMedia({
87        video: true,
88        audio: true
89      });
90    }
91    
92    function GetRTCIceCandidate() {
93      window.RTCIceCandidate =
94        window.RTCIceCandidate ||
95        window.webkitRTCIceCandidate ||
96        window.mozRTCIceCandidate ||
97        window.msRTCIceCandidate;
98    
99      return window.RTCIceCandidate;
100    }
101    
102    function GetRTCPeerConnection() {
103      window.RTCPeerConnection =
104        window.RTCPeerConnection ||
105        window.webkitRTCPeerConnection ||
106        window.mozRTCPeerConnection ||
107        window.msRTCPeerConnection;
108      return window.RTCPeerConnection;
109    }
110    
111    function GetRTCSessionDescription() {
112      window.RTCSessionDescription =
113        window.RTCSessionDescription ||
114        window.webkitRTCSessionDescription ||
115        window.mozRTCSessionDescription ||
116        window.msRTCSessionDescription;
117      return window.RTCSessionDescription;
118    }
119    
120    //Create and send offer to remote peer on button click
121    function callUser(user) {
122      getCam()
123        .then(stream => {
124          if (window.URL) {
125            document.getElementById("selfview").src = window.URL.createObjectURL(
126              stream
127            );
128          } else {
129            document.getElementById("selfview").src = stream;
130          }
131          toggleEndCallButton();
132          caller.addStream(stream);
133          localUserMedia = stream;
134          caller.createOffer().then(function(desc) {
135            caller.setLocalDescription(new RTCSessionDescription(desc));
136            channel.trigger("client-sdp", {
137              sdp: desc,
138              room: user,
139              from: id
140            });
141            room = user;
142          });
143        })
144        .catch(error => {
145          console.log("an error occured", error);
146        });
147    }
148    
149    function endCall() {
150      room = undefined;
151      caller.close();
152      for (let track of localUserMedia.getTracks()) {
153        track.stop();
154      }
155      prepareCaller();
156      toggleEndCallButton();
157    }
158    
159    function endCurrentCall() {
160      channel.trigger("client-endcall", {
161        room: room
162      });
163    
164      endCall();
165    }
166    
167    //Send the ICE Candidate to the remote peer
168    function onIceCandidate(peer, evt) {
169      if (evt.candidate) {
170        channel.trigger("client-candidate", {
171          candidate: evt.candidate,
172          room: room
173        });
174      }
175    }
176    
177    function toggleEndCallButton() {
178      if (document.getElementById("endCall").style.display == "block") {
179        document.getElementById("endCall").style.display = "none";
180      } else {
181        document.getElementById("endCall").style.display = "block";
182      }
183    }
184    
185    //Listening for the candidate message from a peer sent from onicecandidate handler
186    channel.bind("client-candidate", function(msg) {
187      if (msg.room == room) {
188        console.log("candidate received");
189        caller.addIceCandidate(new RTCIceCandidate(msg.candidate));
190      }
191    });
192    
193    //Listening for Session Description Protocol message with session details from remote peer
194    channel.bind("client-sdp", function(msg) {
195      if (msg.room == id) {
196        console.log("sdp received");
197        var answer = confirm(
198          "You have a call from: " + msg.from + "Would you like to answer?"
199        );
200        if (!answer) {
201          return channel.trigger("client-reject", { room: msg.room, rejected: id });
202        }
203        room = msg.room;
204        getCam()
205          .then(stream => {
206            localUserMedia = stream;
207            toggleEndCallButton();
208            if (window.URL) {
209              document.getElementById("selfview").src = window.URL.createObjectURL(
210                stream
211              );
212            } else {
213              document.getElementById("selfview").src = stream;
214            }
215            caller.addStream(stream);
216            var sessionDesc = new RTCSessionDescription(msg.sdp);
217            caller.setRemoteDescription(sessionDesc);
218            caller.createAnswer().then(function(sdp) {
219              caller.setLocalDescription(new RTCSessionDescription(sdp));
220              channel.trigger("client-answer", {
221                sdp: sdp,
222                room: room
223              });
224            });
225          })
226          .catch(error => {
227            console.log("an error occured", error);
228          });
229      }
230    });
231    
232    //Listening for answer to offer sent to remote peer
233    channel.bind("client-answer", function(answer) {
234      if (answer.room == room) {
235        console.log("answer received");
236        caller.setRemoteDescription(new RTCSessionDescription(answer.sdp));
237      }
238    });
239    
240    channel.bind("client-reject", function(answer) {
241      if (answer.room == room) {
242        console.log("Call declined");
243        alert("call to " + answer.rejected + "was politely declined");
244        endCall();
245      }
246    });
247    
248    channel.bind("client-endcall", function(answer) {
249      if (answer.room == room) {
250        console.log("Call Ended");
251        endCall();
252      }
253    });

Next, let’s run our app by running:

    node index.js

Finally, navigate to http://localhost:3000 to try the app out. Below is an image of what we have built:

webrtc-video-call-preview

Conclusion

In this tutorial, you learned how to put together your own WebRTC chat application using Pusher as a signaling server. We covered setting up a WebRTC connection using simple JavaScript. From here you can take things further and explore more complex call applications by adding in better video security, notifications that a user is on another call, group video calls, and more!

The code base to this tutorial is hosted in a public GitHub repository. Play around with the code.