Tech Corner

The ultimate guide to websockets | Everything you need to know

Engati Team
.
May 6
.
5-6 mins

Table of contents

Automate your business at $5/day with Engati

REQUEST A DEMO
What are websockets

Trying to figure out whether you should use websocket or HTTP? This article will show you what a websocket is, how it works, where its used, and how it’s different from HTTP.

What is a websocket?

A websocket is a persistent connection that exists between a client and a server. It offers a bidirectional, full-duplex communications channel that operates over HTTP via a single TCP/IP socket connection. The WebSocket protocol was standardized by the IETF as RFC 6455 in 2011.

Essentially, the WebSocket API is an advanced technology that makes it possible to open up a two-way interactive communication session between your user’s browser and your server. This API enables you to send messages to a server and receive event-driven responses without needing to poll the server for a reply.

A websocket connection does happen to be functionally similar to standard Unix-style sockets, they are not related.

Websocket connection

A websocket is basically a framed protocol, which means that a piece of data (a message) gets sliced down into a number of discrete pieces, with the size of each piece encoded in the frame. The frame is made up of a frame type, a payload length, and a data portion.

The most important pieces of the websocket protocol:

Fin Bit

The Fin bit is the first bit of the WebSocket header. It is set if this frame is the last data to complete the message.

RSV1, RSV2, RSV3 Bits

These bits are saved to be used in the future.

Opcode

There is an opcode for every frame. It determines the way in which you interpret the frame’s payload data. Here’s a list of some opcode values with their description:

  • 0x00 - the frame continues the payload from the previous frame.
  • 0x01 - this opcode denotes a text frame. Text frames are UTF-8 decoded by the server.
  • 0x02 -this opcode denotes a binary frame. Binary frames are delivered by the server without any changes.
  • 0x03-0x07 - this opcode denotes that the frame is reserved for future use.
  • 0x08 - this opcode denotes that the client wants to close the connection.
  • 0x09 - this is a ping frame. It functions as a heartbeat mechanism to ensure that the connection is still alive. The receiver needs to respond with a pong.
  • 0x0a - this is a pong frame. It also functions as a heartbeat mechanism to make sure that the connection is still alive. The receiver needs to respond with a ping frame.
  • 0x0b-0x0f - this opcode also denotes that the frame is reserved for future use.

Mask

Setting this bit to 1 enables masking. Websockets need all payloads to be obfuscated through the use of a random key (the mask) selected by the client. The masking key is then put together with the payload data through the use of an XOR operation before the data is sent to the payload. Masking stops caches from misinterpreting WebSocket frames as cacheable data.

When the websocket protocol was being developed, it was seen that if a compromised server gets deployed, and clients connect to that server, there is a possibility of having intermediate proxies or infrastructure caching the responses of the compromised server so that future clients requesting that data receive the incorrect response. Such an attack is known as cache poisoning and arises from the fact that you cannot control the manner in which misbehaving proxies act. This can be quite an issue when you introduce a new protocol like WebSocket that needs to interact with the existing infrastructure of the internet.

Payload len

The Payload len field and Extended payload length fields are utilized for the purpose of encoding the total length of the payload data for the frame. If the payload data is smaller than 126 bytes, the length gets encoded in the Payload len field. As the payload data increases, we make use of the additional fields to encode the length of the payload.

Masking-key

This is quite closed tied to the mask bit. All frames sent from the client to the server are masked by a 32-bit value that is contained within the frame. This field will be present if the mask bit is set to 1 and is absent if the mask bit is set to 0.

Payload data

The payload data is made up of arbitrary application data and any extension data that has been negotiated between the client and the server. Extensions get negotiated during the initial handshake and make it possible for you to extend the WebSocket protocol for further uses.

What is a websocket used for?

A websocket is used for the purpose of opening a two-way interactive communication session between the user's browser and a server. It enables you to shoot out messages to a server and receive event-driven responses without any need for polling the server for a reply.

How do websockets work?

The process of a websocket connection starts with a WebSocket handshake that involves make use of a new scheme ws or wss. These could be thought of to be equivalent to HTTP and HTTPS respectively.

When this scheme is used, the clients and servers have to follow the standard WebSocket connection protocol. The establishment of the websocket is kicked off with HTTP request upgrading that features a few headers like Connection: Upgrade, Upgrade: WebSocket, Sec-WebSocket- Key, etc.

Here’s how the connection gets established:

The request

The ‘Connection: Upgrade’ header denotes the WebSocket handshake and the ‘Sec-WebSocket-Key’ features a Base64-encoded random value. This value gets arbitrarily whenever a WebSocket handshake takes place. The key header is also a part of the request. 

The response

Ther response header, ‘Sec-WebSocket-Accept’, features the zest of value that was submitted in the ‘Sec-WebSocket-Key’ request header. This is connected with a specific protocol specification and is used extensively to thwart misleading information. It improves API security and prevents ill-configured servers from creating errors and issues in the application development. 

Websocket vs HTTP

Websockets and HTTP are both used for application communication, which might confuse you when you’re trying to figure out which one you should opt for. Let’s see how they are different from each other. 

WebSocket is a framed and bidirectional protocol. In contrast, HTTP is a unidirectional protocol that functions above the TCP protocol.

The websocket protocol has the ability to support continual data transmission. It is widely used in real-time application development. HTTP, on the other hand, is stateless and is generally used for developing RESTful applications. 

Websockets have communication occurring on both ends, because of which it is a faster protocol than HTTP, in which the connection is built at one end, making it a bit slower than WebSocket.

A websocket utilizes a unified TCP connection and requires one party to terminate the connection. Unti one part terminates the connection, the connection stays active. For HTTP, a separate connection needs to be built for different requests and the connection is automatically broken as soon as the request is completed.

Engati Team

At the forefront for digital customer experience, Engati helps you reimagine the customer journey through engagement-first solutions, spanning automation and live chat.

Close Icon
Request a Demo!

Get started on Engati with the help of a personalised demo.

Thanks for the information.
We will be shortly getting in touch with you.
Please enter a valid email address.
For any other query reach out to us on contact@engati.com
Close Icon
Congratulations! Your demo is recorded.

Select an option on how Engati can help you.

I am looking for a conversational AI engagement solution for the web and other channels.

I would like for a conversational AI engagement solution for WhatsApp as the primary channel

I am an e-commerce store with Shopify. I am looking for a conversational AI engagement solution for my business

I am looking to partner with Engati to build conversational AI solutions for other businesses

continue
Finish
Close Icon
You're a step away from building your Al chatbot

How many customers do you expect to engage in a month?

Less Than 2000

2000-5000

More than 5000

Finish
Close Icon
Thanks for the information.

We will be shortly getting in touch with you.

Close Icon

Contact Us

Please fill in your details and we will contact you shortly.

Thanks for the information.
We will be shortly getting in touch with you.
Oops! Looks like there is a problem.
Never mind, drop us a mail at contact@engati.com

<script type="application/ld+json">
{
 "@context": "https://schema.org",
 "@type": "FAQPage",
 "mainEntity": [{
   "@type": "Question",
   "name": "What is a websocket?",
   "acceptedAnswer": {
     "@type": "Answer",
     "text": "A websocket is a persistent connection that exists between a client and a server. It offers a bidirectional, full-duplex communications channel that operates over HTTP via a single TCP/IP socket connection. The WebSocket protocol was standardized by the IETF as RFC 6455 in 2011."
   }
 },{
   "@type": "Question",
   "name": "What is Fin Bit?",
   "acceptedAnswer": {
     "@type": "Answer",
     "text": "The Fin bit is the first bit of the WebSocket header. It is set if this frame is the last data to complete the message."
   }
 },{
   "@type": "Question",
   "name": "How do websockets work?",
   "acceptedAnswer": {
     "@type": "Answer",
     "text": "The process of a websocket connection starts with a WebSocket handshake that involves make use of a new scheme ws or wss. These could be thought of to be equivalent to HTTP and HTTPS respectively."
   }
 }]
}
</script>