WebSockets From Scratch

blog_header_20.png

This guide is aimed at people who are new to WebSocket, or just wish to know more about what’s under the hood.

Introduction

I have been at Pusher for almost 6 months and, mainly working on customer-facing developer work, parts of our deeper infrastructure have seemed a bit of a black box to me. Pusher, a message service that lets you send realtime data from server to client or from client to client, has the WebSocket protocol at its core. I was aware of how the HTTP protocol worked, but not WebSocket – aside from the fact it lets you do some nifty realtime stuff.

Therefore I decided to dig a little deeper and try to build a WebSocket server from scratch – and by ‘scratch’, I mean using only Ruby’s built-in libraries. This blog post is to partly share what I’ve learnt and partly act as a tutorial, given that I couldn’t find many that would lead me through the process step-by-step. That said, there were plenty of awesome resources for getting to grips on the matter, such as on Mozilla and this post from Armin Ronacher.

This guide is aimed at people who are new to WebSocket, or just wish to know more about what’s under the hood. What I’ll cover, in around 100 lines of Ruby, is:

  • The HTTP handshake that initiates a WebSocket connection.
  • Listening to messages on the server.
  • Sending messages from the server.

A lot of very important features will be left out for the sake of brevity, such as ping/pong heartbeats, types of messages that aren’t UTF-8 text data, security, proxying, handling different WebSocket protocol versions, and message fragmentation. So let’s get to it.

An Overview to the WebSocket Procotol

WebSocket, like HTTP, is a layer upon the TCP protocol. A high-level difference between the two is that a classic HTTP response closes the TCP socket, whereas in the WebSocket protocol, the connection stays open. This allows bi-directional communication between the server and client, and is great for the realtime functionality you are used to: chat applications, data visualization, activity streams and so on.

A WebSocket connection begins with a HTTP GET request from the client to the server, called the ‘handshake’. This request carries with it a Connection: upgrade and Upgrade: websocket header to tell the server that it wants to begin a WebSocket connection, and a Sec-WebSocket-Version header that indicates the kind of response it wants. In this guide we’ll only focus on version 13 of the protocol.

The request headers also include a Sec-WebSocket-Key field. With this, the server creates a Sec-WebSocket-Accept header that forms part of its response. How it does this, I will explain later.

Once this handshake is made, each party is free to exchange messages, which are wrapped in ‘frames’. Each frame consists of information about:

  • Whether this frame is or isn’t part of a continuation. In this guide, we’ll only deal with frames that contain a complete message (not fragmented).
  • The content-type. In this post, we’ll only deal with UTF8-encoded text.
  • Whether the frame is encoded, or ‘masked’. Frames from the client always have to be masked; frames from the server do not have to be.
  • The payload length.
  • The masking ‘key’ with which to decode the message – if the frame is masked.
  • The payload of the frame.

The Guide

What We’ll Build

During this post we’ll build a simple echo server that takes messages from a client and sends them back with a thank you, simply as a basic implementation of a WebSocket server.

1server = WebsocketServer.new
2
3loop do
4  Thread.new(server.accept) do |connection|
5    puts "Connected"
6    while (message = connection.recv)
7      puts "Received #{message}"
8      connection.send("Received #{message}. Thanks!")
9    end
10  end
11end

Getting Started

Let’s start with two classes: our WebSocketServer and our WebSocketConnection. Create them in files called websocket_server.rb and websocket_connection.rb respectively.

The WebSocketServer

The WebSocketServer will be initialized with options, such as the path of the WebSocket endpoint, the port and the host – these will default to '/', 4567 and localhost respectively.

1require 'socket'
2
3class WebSocketServer
4
5  def initialize(options={path: '/', port: 4567, host: 'localhost'})
6    @path, port, host = options[:path], options[:port], options[:host]
7    @tcp_server = TCPServer.new(host, port)
8  end
9  ...
10 end

Upon initializaton, a TCPServer object, will be created with our host and port options – though it will not run until we ‘accept‘ it. Remember to require the built-in socket library that lets you create TCP connections.

On calling #accept, our WebSocketServer will be responding to any incoming WebSocket requests. It will be responsible for validating incoming HTTP requests, and sending back a handshake. If a handshake can and has been made – that is, if send_handshake returns true – it will return a new WebSocketConnection, as shown in the example below.

1class WebSocketServer
2
3  ...
4
5  def accept
6    socket = @tcp_server.accept
7    send_handshake(socket) && WebSocketConnection.new(socket)
8  end
9
10end

The WebSocketConnection

The WebSocketConnection will be our API for sending and receiving messages. We initialize it with the TCP socket made upon firing up the TCPServer in WebSocketServer#accept.

1class WebSocketConnection
2
3  attr_reader :socket
4
5  def initialize(socket)
6    @socket = socket
7  end
8end

The connection object will read and write to this socket as it listens for and sends messages.

The Handshake

Going back to our WebSocketServer class, a WebSocketServer#send_handshake method is where everything begins. Firstly, let’s get the request_line (e.g. 'GET / HTTP/1.1') and request header from the socket, using the socket#gets method. This will block if there is nothing yet available, and will also get a line at a time.

1private
2
3def send_handshake(socket)
4  request_line = socket.gets
5  header = get_header(socket)
6  ...
7end
8
9# this gets the header by recursively reading each line offered by the socket
10def get_header(socket, header = "")
11  (line = socket.gets) == "rn" ? header : get_header(socket, header + line)
12end

If we have not received a GET request at the specified path, or there is no Sec-WebSocket-Key in the header, let’s write a 400 error to the socket. We can use the << operator, and then close the socket to end the request. By returning false, we make sure a WebSocketConnection is not created and returned to the application.

1def send_handshake(socket)
2  request_line = socket.gets
3  header = get_header(socket)
4  if (request_line =~ /GET #{@path} HTTP\/1.1/) && (header =~ /Sec-WebSocket-Key: (.*)\r\n/)
5    ... # complete the handshake
6  end
7  send_400(socket)
8  false # reject the handshake
9end
10
11def send_400(socket)
12  socket << "HTTP/1.1 400 Bad Request\r\n" +
13            "Content-Type: text/plain\r\n" +
14            "Connection: close\r\n" +
15            "\r\n" +
16            "Incorrect request"
17  socket.close
18end

If there is a value to Sec-WebSocket-Key, according to the regular expression above, we can take that value and create the Sec-WebSocket-Accept header in our response. It does so by taking the value of the Sec-WebSocket-Key and concatenating it with "258EAFA5-E914-47DA-95CA-C5AB0DC85B11", a ‘magic string’, defined in the protocol specification. It takes this concatenation, creates a SHA1 digest of it, then encodes this digest in Base64. We can do this using the built-in digest/sha1 and base64 libraries.

1def send_handshake(socket)
2  request_line = socket.gets
3  header = get_header(socket)
4  if (request_line =~ /GET #{@path} HTTP\/1.1/) && (header =~ /Sec-WebSocket-Key: (.*)\r\n/)
5    ws_accept = create__accept($1)
6    ...
7  end
8  send_400(socket)
9  false
10end
11
12WS_MAGIC_STRING = "258EAFA5-E914-47DA-95CA-C5AB0DC85B11"
13
14require 'digest/sha1'
15require 'base64'
16
17def create_websocket_accept(key)
18  digest = Digest::SHA1.digest(key + WS_MAGIC_STRING)
19  Base64.encode64(digest)
20end

Now we can take this key, and write our expected response to the socket. This response includes the status code "101 Switching Protocols", to indicate that the server and client will now be speaking via a WebSocket. This also includes the same Upgrade and Connection headers sent to us by the client, and also the appropriate Sec-WebSocket-Accept key and value.

1def send_handshake(socket)
2  request_line = socket.gets
3  header = get_header(socket)
4  if (request_line =~ /GET #{@path} HTTP\/1.1/) && (header =~ /Sec-WebSocket-Key: (.*)\r\n/)
5    ws_accept = create__accept($1)
6    send_handshake_response(socket, ws_accept)
7    return true
8  end
9  send_400(socket)
10  false
11end
12
13def send_handshake_response(socket, ws_accept)
14  socket << "HTTP/1.1 101 Switching Protocols\r\n"
15            "Upgrade: websocket\r\n"
16            "Connection: Upgrade\r\n"
17            "Sec-WebSocket-Accept: #{ws_accept}\r\n"
18end

Now that we’ve sent the handshake and returned true, a new WebSocketConnection will be returned to our application. So, test it out! Let’s create a loop that constantly listens for new requests. Using Thread.new and yielding the result of server.accept – which should be our new connection – we can handle concurrent requests.

In your Ruby app, write this:

1server = WebSocketServer.new
2
3loop do
4  Thread.new(server.accept) do |connection|
5    puts "Connected"
6  end
7end

Run this app and while this code is running, open up your browser console (on a page not served via HTTPS) and create a WebSocket connection to your server:

1var socket = new WebSocket("ws://localhost:4567");

You should see that a connection has been made upon handshake, and printed "Connected" to your terminal window. If not, you can check out the source code here.

Listening For Messages

Now that clients can connect to us and the user has access to the WebSocketConnection object, we can start listening to messages from the client.

By the end of this section, here is what we want to have:

1loop do
2  Thread.new(server.accept) do |connection|
3    puts "Connected"
4    while (message = connection.recv)
5      puts message
6    end
7  end
8end

And if from our browser console, we type –

1var socket = new WebSocket("ws://localhost:4567");
2socket.send("hello");

- we should hope to see "hello" in our terminal window. Of course if you try this out now you’ll get an error.

So let’s create WebSocketConnection#recv method. In it we will read from the socket if there are bytes available.

1class WebSocketConnection
2
3  ...
4
5  def recv
6  end
7
8end

As mentioned in the overview above, WebSocket messages are wrapped in frames, which are a sequence of bytes carrying information about the message. Our #recv method will parse the bytes of a frame and yield the message’s content to the application thread.

Let’s have a look at what we’ll receive if, as in the example above, we send "hello" over the socket.

Byte value129133322520897212418810179
Binary representation1000000110000101001000000001100111010000000010010100100001111100101111000110010101001111
MeaningFin + opcodeMask indicator + Length indicatorKeyKeyKeyKeyContentContentContentContentContent

The first byte indicates whether this is the complete message. If the first bit is 1 (as it is) then yes, otherwise it is 0. The next 3 bytes are reserved. And the remainder of the byte (0001) indicates that the content type is text.

Using the TCPSocket#read method, we can read n bytes at a time:

1def recv
2    fin_and_opcode = socket.read(1).bytes[0] # get the 0th item of [129]
3    ...
4end

The second byte contains two pieces of information. Firstly, if the message is encoded with a ‘mask’. If it’s from a client, it always will be. It cannot be if it’s from a server. If it is masked, the first bit will be 1.

The remainder of the byte indicates the content’s length. Firstly, we need to remove the first bit out of the equation by subtracting 128 (or calling mask_and_length_indicator & 0x7f, if you are comfortable with bitwise operators – which I’m not).

1def recv
2  fin_and_opcode = socket.read(1).bytes[0]
3  mask_and_length_indicator = socket.read(1).bytes[0]
4  length_indicator = mask_and_length_indicator - 128
5  ...
6end

If the result is smaller or equal to 125, that is the content length.

1def recv
2  fin_and_opcode = socket.read(1).bytes
3  mask_and_length_indicator = socket.read(1).bytes[0]
4  length_indicator = mask_and_length_indicator - 128
5
6  length =  if length_indicator <= 125
7              length_indicator
8              ...
9            end
10end

If the length_indicator is equal to 126, the next two bytes need to be parsed into a 16-bit unsigned integer to get the numeric value of the length. We do this by using Ruby’s Array#unpack method, passing in "n" to show we want a 16-bit unsigned integer, as per Ruby’s documentation here.

1def recv
2  fin_and_opcode = socket.read(1).bytes
3  mask_and_length_indicator = socket.read(1).bytes[0]
4  length_indicator = mask_and_length_indicator - 128
5
6  length =  if length_indicator <= 125
7              length_indicator
8            elsif length_indicator == 126
9              socket.read(2).unpack("n")[0]
10            ...
11            end
12end

If the length_indicator is equal to 127, the next eight bytes will need to be parsed into a 64-bit unsigned integer to get the length. "Q>" is passed to unpack to indicate this.

1def recv
2  fin_and_opcode = socket.read(1).bytes
3  mask_and_length_indicator = socket.read(1).bytes[0]
4  length_indicator = mask_and_length_indicator - 128
5
6  length =  if length_indicator <= 125
7              length_indicator
8            elsif length_indicator == 126
9              socket.read(2).unpack("n")[0]
10            else
11              socket.read(8).unpack("Q>")[0]
12            end
13  ...
14end

The mask-key itself – what we use to decode the content – will be the next 4 bytes. Then, the encoded content will be the next nth bytes, where n is the content-length we extracted.

1def recv
2  fin_and_opcode = socket.read(1).bytes
3  mask_and_length_indicator = socket.read(1).bytes[0]
4  length_indicator = mask_and_length_indicator - 128
5
6  length =  if length_indicator <= 125
7              length_indicator
8            elsif length_indicator == 126
9              socket.read(2).unpack("n")[0]
10            else
11              socket.read(8).unpack("Q>")[0]
12            end
13
14  keys = socket.read(4).bytes
15  encoded = socket.read(length).bytes
16  ...
17end

Let’s again use the mask-key to decode the content by using this magic function that loops through the bytes and XORs the octet with the (i % 4)th octet of the mask. This is defined in the specification here.

1def recv
2  fin_and_opcode = socket.read(1).bytes
3  mask_and_length_indicator = socket.read(1).bytes[0]
4  length_indicator = mask_and_length_indicator - 128
5
6  length =  if length_indicator <= 125
7              length_indicator
8            elsif length_indicator == 126
9              socket.read(2).unpack("n")[0]
10            else
11              socket.read(8).unpack("Q>")[0]
12            end
13
14  keys = socket.read(4).bytes
15  encoded = socket.read(length).bytes
16
17  decoded = encoded.each_with_index.map do |byte, index|
18    byte ^ keys[index % 4]
19  end
20  ...
21end

Now that we have the decoded content of the message, let’s turn it into a string and return it:

1def recv
2  fin_and_opcode = socket.read(1).bytes
3  mask_and_length_indicator = socket.read(1).bytes[0]
4  length_indicator = mask_and_length_indicator - 128
5
6  length =  if length_indicator <= 125
7              length_indicator
8            elsif length_indicator == 126
9              socket.read(2).unpack("n")[0]
10            else
11              socket.read(8).unpack("Q>")[0]
12            end
13
14  keys = socket.read(4).bytes
15  encoded = socket.read(length).bytes
16
17  decoded = encoded.each_with_index.map do |byte, index|
18    byte ^ keys[index % 4]
19  end
20
21  decoded.pack("c*")
22end

Test it out on the example at the top of this section. If you’ve gotten stuck, you can refer to the code here.

Sending Messages

To complete our echo server and show the bidirectional power of WebSockets, let’s implement a message sending method to our WebSocketConnection object. This should be a little more straightforward, as messages from a server do not have to be masked.

1def send(message)
2  ...
3end

We’ll create the initial state of our byte array to send over the socket. This is straightforward as we’re sending a complete message and our content is text, so the first value in the array will be 129, i.e. 10000001. The first bit 1, representing that this is a full message, and the last four bits, 0001, showing that the payload is UTF-8 text.

Then we’ll get the size of the message and set the length indicator accordingly. Because our frame is not masked, we do not need to add or subtract by 128 (in other words, set the first bit as 1), which the client had done to their messages sent to us.

If the size is smaller or equal to 125, we concatenate this to the byte array.

1def send(message)
2  bytes = [129]
3  size = message.bytesize
4
5  bytes +=  if size <= 125
6              [size]
7              ...
8            end
9end

If the size is greater than 125 but smaller than 216, which is the maximum size of two bytes, then we append 126 and the byte array of the length converted from an unsigned 16-bit integer.

1def send(message)
2  bytes = [129]
3  size = message.bytesize
4
5  bytes +=  if size <= 125
6              [size]
7            elsif size < 2**16
8              [126] + [size].pack("n").bytes
9            ...
10            end
11end

If the size is greater than 216, we append 127 to the frame and then the byte array of the length converted from an unsigned 64-bit integer.

1def send(message)
2  bytes = [129]
3  size = message.bytesize
4
5  bytes +=  if size <= 125
6              [size]
7            elsif size < 2**16
8              [126] + [size].pack("n").bytes
9            else
10              [127] + [size].pack("Q>").bytes
11            end
12  ...
13end

Now we can simply append our message as bytes. Then we turn this byte array into chars (using Array#pack with the argument "C*"). Now we can write this to the socket!

1def send(message)
2  bytes = [129]
3  size = message.bytesize
4
5  bytes +=  if size <= 125
6              [size]
7            elsif size < 2**16
8              [126] + [size].pack("n").bytes
9            else
10              [127] + [size].pack("Q>").bytes
11            end
12
13  bytes += message.bytes
14  data = bytes.pack("C*")
15  socket << data
16end

The Echo Server

Now that we can begin connections, send messages and receive messages, we can write our tiny echo-server application.

1server = WebsocketServer.new
2
3loop do
4  Thread.new(server.accept) do |connection|
5    puts "Connected"
6    while (message = connection.recv)
7      puts "Received #{message} from the browser"
8      connection.send("Received #{message}. Thanks!")
9    end
10  end
11end

Run this server, and then go into your browser console. Then type:

1var socket = new WebSocket("ws://localhost:4567");
2
3socket.onmessage = function(event){console.log(event.data);};

This will set up your WebSocket connection by sending a handshake to your server. Then, if a message is received, it will log it to the console.

Let’s send a message and see what we get back:

1socket.send("hello world!");

Immediately after sending the message, your browser should have logged out an event whose data is "Received hello world. Thanks!". Meanwhile, your terminal running the server should have logged out "Received hello world from the browser".

That’s it! I hope you enjoyed this post and that it was informative for those who were new to WebSocket.

What’s Missing?

As I mentioned earlier, there’s a lot more one can improve and add to make it a fully-functional WebSocket server – not to mention making it able to handle thousands of concurrent connections. From experience, we’ve found that developers who implement their own scalable WebSocket solutions have found it tricky to maintain and debug. In addition, developers have to be mindful that WebSocket connections, unlike HTTP, are stateful; managing that state within a properly distributed and load-balanced system is a non-trivial problem.

Thus Pusher’s appeal to those for whom realtime is core to their application; we essentially host, maintain and scale these servers for you, and provide an easy-to-use API to interact with them so you can focus on the rest of your application. Hopefully this post has showed you a bit about what goes on underneath.

Further Reading