What is API Rate Limiter?
The primary principle of API rate limiting is quite simple. It controls the number of access developers allow for everyone to use API and in what capacity. It works as an access window, limiting the use of API, controlling the overuse, and making it available for other potential users.
API rate limiting is integral for any API product’s scalability and development and works both as a form of protection and a form of quality control.
API rate limits are calculated in a metric called Requests Per Second or RPS/TPS. API rate limiting is basically enforcing a limit to the number of TPS or the amount of data users can consume.
For example, let’s say a developer wants to allow a consumer to call the API a maximum of 5 times per minute. In this matter, the developer would apply a rate limit to their API defined as “5 requests per 60 seconds”. This denotes that the client will be able to successfully call the API up to 5 times within any 60-second gap and after that, the user will get an error stating their rate limit has been exceeded if they call it a 6th time within that time frame.
What is API?
When you operate an application on your mobile phone, the application connects to the Internet and transmits data to a server. The server retrieves that data, interprets it, performs the essential actions, and sends it back to your phone. The application then interprets that data and gives you the information you wanted in a readable way. This all happens via API and this is what an API means.
How does API Rate Limiting work?
While APIs are hidden from most users, they are necessary for the application to perform optimally. For example, when we order food from a food delivery app, an API is implemented so that we, as a user, will get the total amount for the dish we selected plus the delivery and other charges, at the same time it reflects the delivery time as well.
We don’t interact directly with this API but through the food delivery app's interface. Every time an API reacts to a request, depending on the business model, the owner of the API has to pay for resources. For example, in the example above, the food delivery app’s API integration will cause the request for delivery time estimation and availability of products with the restaurants, the App needs to pay for the information request.
Therefore, any service that offers API for developers will implement a rate limit on how many API calls/requests can be made. The limiting can be performed in different ways, like limiting the number of API calls per hour, each day, by each user, or limiting the amount of data requested per call, among others.
What is the importance of API Rate Limiting? Or why API Rate Limiting is necessary?
Rate limiting is typically put in place as a defensive measure to protect services and data. Services that are shared and accessed by a lot of users need to protect themselves from excessive use to maintain the service availability and functionality of the service. Highly scalable systems also have limits on consumption in order to sustain performance and reduce the chances of cascading failure.
Rate limiting on both server-side and client-side is important for maximizing reliability and minimizing the delay, and the larger the systems/APIs, the more crucial rate limiting will be.
Following is the importance of API rate limiting and why it is necessary.
1. Protect resource usage
It helps improve the availability of API service for as many users as possible by avoiding excessive resource usage and also reduces the Dos (friendly-fire denial of service) errors.
2. Maximizing cost-efficiency
As discussed above, businesses/platforms need to pay for API transactions to retrieve information and data. Rate limiting can be enabled to prevent too many resources that may stash large costs.
3. Regulating data flow
Rate limiting can be implemented to handle data flow. Developers can distribute the data more evenly between two elements of the APIs by limiting the flow into each element.
4. Controlling allocations between users
When the capacity of an API’s service is shared among many users, rate-limiting should be applied for individual users’ usage, without affecting other users' access.
Are there different types of API Rate Limiter?
Yes, there are two types of rate limiters based on the use case, which are as follows.
Static Rate Limiter
The static rate limiter is employed to rate-limit the number of connections/requests on virtual service in total. For instance, if the virtual service rate limit is set to 100 connections/second, it will deny 101’th connection/request for the post the dedicated period.
There a few sub-types of static rate limiter, which are as follows:
- Virtual Service Connection Rate Limiter
- Network Security Rate Limiter
- DNS policy Rate Limiter
Dynamic Rate Limiter
The dynamic rate limiter is used to rate-limit the number of connections/requests on virtual service users. For example, if the dynamic rate limiter is set to serve 500 connections/requests per second, then it will only allow 500 requests from user X, 500 requests from user Y, and so on.
- Application Profile Rate Limiter
What are the different methods for API Rate Limiting?
There are multiple ways you can rate-limit your API. Here are three of the most renowned ways to go about API rate-limiting.
1. Request Queues
Request queues refer to setting the rate limit to a specific number per second/minute and creating a queue of this sequence to avoid errors and data retrieval failures. There is enough software available in the market to set request queues while developing APIs, like Android Volley and Amazon Simple Queue Service (ASQS).
Throttling is another method to implement rate-limiting, controlling how their API is used by setting up a temporary state, allowing the API to assess each request. When a throttle is triggered, it might disconnect or simply reduce the bandwidth of the user to avoid over-usage.
3. Rate-limiting Algorithms
Apart from request queue libraries and throttling services, there are many rate-limiting algorithms that developers can use to set API rate limits.
A. Leaky Bucket
The leaky bucket algorithm translates requests into a First In First Out (FIFO) format, processing the requests on the queue at a standard rate.
B. Fixed Window
Fixed window algorithms use a fixed rate to track the rate of requests or queries using a simple cumulative counter. The developers set a counter, like 2500 requests per hour, if the counter exceeds the limit for the set duration, the extra requests will be discarded.
C. Sliding Log
A sliding log algorithm involves tracking each request via a time-stamped log which discards the logs with time-stamps that exceed the rate limit.
4. Sliding Window
Sliding window algorithms combine both fixed window and sliding log algorithms.