This guide is aimed at people who are new to WebSocket, or just wish to know more about what’s under the hood.
I have been at Pusher for almost 6 months and, mainly working on customer-facing developer work, parts of our deeper infrastructure have seemed a bit of a black box to me. Pusher, a message service that lets you send realtime data from server to client or from client to client, has the WebSocket protocol at its core. I was aware of how the HTTP protocol worked, but not WebSocket – aside from the fact it lets you do some nifty realtime stuff.
Therefore I decided to dig a little deeper and try to build a WebSocket server from scratch – and by ‘scratch’, I mean using only Ruby’s built-in libraries. This blog post is to partly share what I’ve learnt and partly act as a tutorial, given that I couldn’t find many that would lead me through the process step-by-step. That said, there were plenty of awesome resources for getting to grips on the matter, such as on Mozilla and this post from Armin Ronacher.
This guide is aimed at people who are new to WebSocket, or just wish to know more about what’s under the hood. What I’ll cover, in around 100 lines of Ruby, is:
A lot of very important features will be left out for the sake of brevity, such as ping/pong heartbeats, types of messages that aren’t UTF-8 text data, security, proxying, handling different WebSocket protocol versions, and message fragmentation. So let’s get to it.
WebSocket, like HTTP, is a layer upon the TCP protocol. A high-level difference between the two is that a classic HTTP response closes the TCP socket, whereas in the WebSocket protocol, the connection stays open. This allows bi-directional communication between the server and client, and is great for the realtime functionality you are used to: chat applications, data visualization, activity streams and so on.
A WebSocket connection begins with a HTTP GET request from the client to the server, called the ‘handshake’. This request carries with it a Connection: upgrade
and Upgrade: websocket
header to tell the server that it wants to begin a WebSocket connection, and a Sec-WebSocket-Version
header that indicates the kind of response it wants. In this guide we’ll only focus on version 13 of the protocol.
The request headers also include a Sec-WebSocket-Key
field. With this, the server creates a Sec-WebSocket-Accept
header that forms part of its response. How it does this, I will explain later.
Once this handshake is made, each party is free to exchange messages, which are wrapped in ‘frames’. Each frame consists of information about:
During this post we’ll build a simple echo server that takes messages from a client and sends them back with a thank you, simply as a basic implementation of a WebSocket server.
1server = WebsocketServer.new 2 3loop do 4 Thread.new(server.accept) do |connection| 5 puts "Connected" 6 while (message = connection.recv) 7 puts "Received #{message}" 8 connection.send("Received #{message}. Thanks!") 9 end 10 end 11end
Let’s start with two classes: our WebSocketServer
and our WebSocketConnection
. Create them in files called websocket_server.rb
and websocket_connection.rb
respectively.
The WebSocketServer
will be initialized with options, such as the path of the WebSocket endpoint, the port and the host – these will default to '/'
, 4567
and localhost
respectively.
1require 'socket' 2 3class WebSocketServer 4 5 def initialize(options={path: '/', port: 4567, host: 'localhost'}) 6 @path, port, host = options[:path], options[:port], options[:host] 7 @tcp_server = TCPServer.new(host, port) 8 end 9 ... 10 end
Upon initializaton, a TCPServer
object, will be created with our host and port options – though it will not run until we ‘accept
‘ it. Remember to require the built-in socket
library that lets you create TCP connections.
On calling #accept
, our WebSocketServer
will be responding to any incoming WebSocket requests. It will be responsible for validating incoming HTTP requests, and sending back a handshake. If a handshake can and has been made – that is, if send_handshake returns true
– it will return a new WebSocketConnection
, as shown in the example below.
1class WebSocketServer 2 3 ... 4 5 def accept 6 socket = @tcp_server.accept 7 send_handshake(socket) && WebSocketConnection.new(socket) 8 end 9 10end
The WebSocketConnection
will be our API for sending and receiving messages. We initialize it with the TCP socket made upon firing up the TCPServer
in WebSocketServer#accept
.
1class WebSocketConnection 2 3 attr_reader :socket 4 5 def initialize(socket) 6 @socket = socket 7 end 8end
The connection object will read and write to this socket as it listens for and sends messages.
Going back to our WebSocketServer
class, a WebSocketServer#send_handshake
method is where everything begins. Firstly, let’s get the request_line
(e.g. 'GET / HTTP/1.1'
) and request header
from the socket, using the socket#gets
method. This will block if there is nothing yet available, and will also get a line at a time.
1private 2 3def send_handshake(socket) 4 request_line = socket.gets 5 header = get_header(socket) 6 ... 7end 8 9# this gets the header by recursively reading each line offered by the socket 10def get_header(socket, header = "") 11 (line = socket.gets) == "rn" ? header : get_header(socket, header + line) 12end
If we have not received a GET request at the specified path, or there is no Sec-WebSocket-Key
in the header, let’s write a 400 error to the socket. We can use the <<
operator, and then close the socket to end the request. By returning false
, we make sure a WebSocketConnection
is not created and returned to the application.
1def send_handshake(socket) 2 request_line = socket.gets 3 header = get_header(socket) 4 if (request_line =~ /GET #{@path} HTTP\/1.1/) && (header =~ /Sec-WebSocket-Key: (.*)\r\n/) 5 ... # complete the handshake 6 end 7 send_400(socket) 8 false # reject the handshake 9end 10 11def send_400(socket) 12 socket << "HTTP/1.1 400 Bad Request\r\n" + 13 "Content-Type: text/plain\r\n" + 14 "Connection: close\r\n" + 15 "\r\n" + 16 "Incorrect request" 17 socket.close 18end
If there is a value to Sec-WebSocket-Key
, according to the regular expression above, we can take that value and create the Sec-WebSocket-Accept
header in our response. It does so by taking the value of the Sec-WebSocket-Key
and concatenating it with "258EAFA5-E914-47DA-95CA-C5AB0DC85B11"
, a ‘magic string’, defined in the protocol specification. It takes this concatenation, creates a SHA1 digest of it, then encodes this digest in Base64. We can do this using the built-in digest/sha1
and base64
libraries.
1def send_handshake(socket) 2 request_line = socket.gets 3 header = get_header(socket) 4 if (request_line =~ /GET #{@path} HTTP\/1.1/) && (header =~ /Sec-WebSocket-Key: (.*)\r\n/) 5 ws_accept = create__accept($1) 6 ... 7 end 8 send_400(socket) 9 false 10end 11 12WS_MAGIC_STRING = "258EAFA5-E914-47DA-95CA-C5AB0DC85B11" 13 14require 'digest/sha1' 15require 'base64' 16 17def create_websocket_accept(key) 18 digest = Digest::SHA1.digest(key + WS_MAGIC_STRING) 19 Base64.encode64(digest) 20end
Now we can take this key, and write our expected response to the socket. This response includes the status code "101 Switching Protocols"
, to indicate that the server and client will now be speaking via a WebSocket. This also includes the same Upgrade
and Connection
headers sent to us by the client, and also the appropriate Sec-WebSocket-Accept
key and value.
1def send_handshake(socket) 2 request_line = socket.gets 3 header = get_header(socket) 4 if (request_line =~ /GET #{@path} HTTP\/1.1/) && (header =~ /Sec-WebSocket-Key: (.*)\r\n/) 5 ws_accept = create__accept($1) 6 send_handshake_response(socket, ws_accept) 7 return true 8 end 9 send_400(socket) 10 false 11end 12 13def send_handshake_response(socket, ws_accept) 14 socket << "HTTP/1.1 101 Switching Protocols\r\n" 15 "Upgrade: websocket\r\n" 16 "Connection: Upgrade\r\n" 17 "Sec-WebSocket-Accept: #{ws_accept}\r\n" 18end
Now that we’ve sent the handshake and returned true
, a new WebSocketConnection
will be returned to our application. So, test it out! Let’s create a loop
that constantly listens for new requests. Using Thread.new
and yielding the result of server.accept
– which should be our new connection – we can handle concurrent requests.
In your Ruby app, write this:
1server = WebSocketServer.new 2 3loop do 4 Thread.new(server.accept) do |connection| 5 puts "Connected" 6 end 7end
Run this app and while this code is running, open up your browser console (on a page not served via HTTPS) and create a WebSocket connection to your server:
1var socket = new WebSocket("ws://localhost:4567");
You should see that a connection
has been made upon handshake, and printed "Connected"
to your terminal window. If not, you can check out the source code here.
Now that clients can connect to us and the user has access to the WebSocketConnection
object, we can start listening to messages from the client.
By the end of this section, here is what we want to have:
1loop do 2 Thread.new(server.accept) do |connection| 3 puts "Connected" 4 while (message = connection.recv) 5 puts message 6 end 7 end 8end
And if from our browser console, we type –
1var socket = new WebSocket("ws://localhost:4567"); 2socket.send("hello");
- we should hope to see "hello"
in our terminal window. Of course if you try this out now you’ll get an error.
So let’s create WebSocketConnection#recv
method. In it we will read from the socket if there are bytes available.
1class WebSocketConnection 2 3 ... 4 5 def recv 6 end 7 8end
As mentioned in the overview above, WebSocket messages are wrapped in frames, which are a sequence of bytes carrying information about the message. Our #recv
method will parse the bytes of a frame and yield the message’s content to the application thread.
Let’s have a look at what we’ll receive if, as in the example above, we send "hello"
over the socket.
Byte value | 129 | 133 | 32 | 25 | 208 | 9 | 72 | 124 | 188 | 101 | 79 |
---|---|---|---|---|---|---|---|---|---|---|---|
Binary representation | 10000001 | 10000101 | 00100000 | 00011001 | 11010000 | 00001001 | 01001000 | 01111100 | 10111100 | 01100101 | 01001111 |
Meaning | Fin + opcode | Mask indicator + Length indicator | Key | Key | Key | Key | Content | Content | Content | Content | Content |
The first byte indicates whether this is the complete message. If the first bit is 1
(as it is) then yes, otherwise it is 0
. The next 3 bytes are reserved. And the remainder of the byte (0001
) indicates that the content type is text.
Using the TCPSocket#read
method, we can read n
bytes at a time:
1def recv 2 fin_and_opcode = socket.read(1).bytes[0] # get the 0th item of [129] 3 ... 4end
The second byte contains two pieces of information. Firstly, if the message is encoded with a ‘mask’. If it’s from a client, it always will be. It cannot be if it’s from a server. If it is masked, the first bit will be 1
.
The remainder of the byte indicates the content’s length. Firstly, we need to remove the first bit out of the equation by subtracting 128 (or calling mask_and_length_indicator & 0x7f
, if you are comfortable with bitwise operators – which I’m not).
1def recv 2 fin_and_opcode = socket.read(1).bytes[0] 3 mask_and_length_indicator = socket.read(1).bytes[0] 4 length_indicator = mask_and_length_indicator - 128 5 ... 6end
If the result is smaller or equal to 125, that is the content length.
1def recv 2 fin_and_opcode = socket.read(1).bytes 3 mask_and_length_indicator = socket.read(1).bytes[0] 4 length_indicator = mask_and_length_indicator - 128 5 6 length = if length_indicator <= 125 7 length_indicator 8 ... 9 end 10end
If the length_indicator
is equal to 126, the next two bytes need to be parsed into a 16-bit unsigned integer to get the numeric value of the length. We do this by using Ruby’s Array#unpack
method, passing in "n"
to show we want a 16-bit unsigned integer, as per Ruby’s documentation here.
1def recv 2 fin_and_opcode = socket.read(1).bytes 3 mask_and_length_indicator = socket.read(1).bytes[0] 4 length_indicator = mask_and_length_indicator - 128 5 6 length = if length_indicator <= 125 7 length_indicator 8 elsif length_indicator == 126 9 socket.read(2).unpack("n")[0] 10 ... 11 end 12end
If the length_indicator
is equal to 127, the next eight bytes will need to be parsed into a 64-bit unsigned integer to get the length. "Q>"
is passed to unpack
to indicate this.
1def recv 2 fin_and_opcode = socket.read(1).bytes 3 mask_and_length_indicator = socket.read(1).bytes[0] 4 length_indicator = mask_and_length_indicator - 128 5 6 length = if length_indicator <= 125 7 length_indicator 8 elsif length_indicator == 126 9 socket.read(2).unpack("n")[0] 10 else 11 socket.read(8).unpack("Q>")[0] 12 end 13 ... 14end
The mask-key itself – what we use to decode the content – will be the next 4 bytes. Then, the encoded content will be the next nth
bytes, where n
is the content-length we extracted.
1def recv 2 fin_and_opcode = socket.read(1).bytes 3 mask_and_length_indicator = socket.read(1).bytes[0] 4 length_indicator = mask_and_length_indicator - 128 5 6 length = if length_indicator <= 125 7 length_indicator 8 elsif length_indicator == 126 9 socket.read(2).unpack("n")[0] 10 else 11 socket.read(8).unpack("Q>")[0] 12 end 13 14 keys = socket.read(4).bytes 15 encoded = socket.read(length).bytes 16 ... 17end
Let’s again use the mask-key to decode the content by using this magic function that loops through the bytes and XORs the octet with the (i % 4)
th octet of the mask. This is defined in the specification here.
1def recv 2 fin_and_opcode = socket.read(1).bytes 3 mask_and_length_indicator = socket.read(1).bytes[0] 4 length_indicator = mask_and_length_indicator - 128 5 6 length = if length_indicator <= 125 7 length_indicator 8 elsif length_indicator == 126 9 socket.read(2).unpack("n")[0] 10 else 11 socket.read(8).unpack("Q>")[0] 12 end 13 14 keys = socket.read(4).bytes 15 encoded = socket.read(length).bytes 16 17 decoded = encoded.each_with_index.map do |byte, index| 18 byte ^ keys[index % 4] 19 end 20 ... 21end
Now that we have the decoded content of the message, let’s turn it into a string and return it:
1def recv 2 fin_and_opcode = socket.read(1).bytes 3 mask_and_length_indicator = socket.read(1).bytes[0] 4 length_indicator = mask_and_length_indicator - 128 5 6 length = if length_indicator <= 125 7 length_indicator 8 elsif length_indicator == 126 9 socket.read(2).unpack("n")[0] 10 else 11 socket.read(8).unpack("Q>")[0] 12 end 13 14 keys = socket.read(4).bytes 15 encoded = socket.read(length).bytes 16 17 decoded = encoded.each_with_index.map do |byte, index| 18 byte ^ keys[index % 4] 19 end 20 21 decoded.pack("c*") 22end
Test it out on the example at the top of this section. If you’ve gotten stuck, you can refer to the code here.
To complete our echo server and show the bidirectional power of WebSockets, let’s implement a message sending method to our WebSocketConnection
object. This should be a little more straightforward, as messages from a server do not have to be masked.
1def send(message) 2 ... 3end
We’ll create the initial state of our byte array to send over the socket. This is straightforward as we’re sending a complete message and our content is text, so the first value in the array will be 129
, i.e. 10000001
. The first bit 1
, representing that this is a full message, and the last four bits, 0001
, showing that the payload is UTF-8 text.
Then we’ll get the size of the message and set the length indicator accordingly. Because our frame is not masked, we do not need to add or subtract by 128 (in other words, set the first bit as 1
), which the client had done to their messages sent to us.
If the size is smaller or equal to 125, we concatenate this to the byte array.
1def send(message) 2 bytes = [129] 3 size = message.bytesize 4 5 bytes += if size <= 125 6 [size] 7 ... 8 end 9end
If the size is greater than 125 but smaller than 216, which is the maximum size of two bytes, then we append 126 and the byte array of the length converted from an unsigned 16-bit integer.
1def send(message) 2 bytes = [129] 3 size = message.bytesize 4 5 bytes += if size <= 125 6 [size] 7 elsif size < 2**16 8 [126] + [size].pack("n").bytes 9 ... 10 end 11end
If the size is greater than 216, we append 127 to the frame and then the byte array of the length converted from an unsigned 64-bit integer.
1def send(message) 2 bytes = [129] 3 size = message.bytesize 4 5 bytes += if size <= 125 6 [size] 7 elsif size < 2**16 8 [126] + [size].pack("n").bytes 9 else 10 [127] + [size].pack("Q>").bytes 11 end 12 ... 13end
Now we can simply append our message
as bytes. Then we turn this byte array into chars (using Array#pack
with the argument "C*"
). Now we can write this to the socket!
1def send(message) 2 bytes = [129] 3 size = message.bytesize 4 5 bytes += if size <= 125 6 [size] 7 elsif size < 2**16 8 [126] + [size].pack("n").bytes 9 else 10 [127] + [size].pack("Q>").bytes 11 end 12 13 bytes += message.bytes 14 data = bytes.pack("C*") 15 socket << data 16end
Now that we can begin connections, send messages and receive messages, we can write our tiny echo-server application.
1server = WebsocketServer.new 2 3loop do 4 Thread.new(server.accept) do |connection| 5 puts "Connected" 6 while (message = connection.recv) 7 puts "Received #{message} from the browser" 8 connection.send("Received #{message}. Thanks!") 9 end 10 end 11end
Run this server, and then go into your browser console. Then type:
1var socket = new WebSocket("ws://localhost:4567"); 2 3socket.onmessage = function(event){console.log(event.data);};
This will set up your WebSocket connection by sending a handshake to your server. Then, if a message is received, it will log it to the console.
Let’s send a message and see what we get back:
1socket.send("hello world!");
Immediately after sending the message, your browser should have logged out an event whose data is "Received hello world. Thanks!"
. Meanwhile, your terminal running the server should have logged out "Received hello world from the browser"
.
That’s it! I hope you enjoyed this post and that it was informative for those who were new to WebSocket.
As I mentioned earlier, there’s a lot more one can improve and add to make it a fully-functional WebSocket server – not to mention making it able to handle thousands of concurrent connections. From experience, we’ve found that developers who implement their own scalable WebSocket solutions have found it tricky to maintain and debug. In addition, developers have to be mindful that WebSocket connections, unlike HTTP, are stateful; managing that state within a properly distributed and load-balanced system is a non-trivial problem.
Thus Pusher’s appeal to those for whom realtime is core to their application; we essentially host, maintain and scale these servers for you, and provide an easy-to-use API to interact with them so you can focus on the rest of your application. Hopefully this post has showed you a bit about what goes on underneath.