Technical Glossary

Rate Limiting

Definition: Technique controlling the number of requests a client can make to an API within a time period, protecting services against abuse and overload.

— Source: NERVICO, Product Development Consultancy

What Is Rate Limiting

Rate limiting is a control mechanism that restricts the number of requests a client or user can make to a service within a defined time interval. When the limit is exceeded, additional requests are rejected with an HTTP 429 (Too Many Requests) error. It is an essential defense against abuse, denial-of-service attacks, and disproportionate resource consumption.

How It Works

The most common algorithms are token bucket, which replenishes tokens at a constant rate and allows controlled bursts, and sliding window, which counts requests within a sliding time window. Limits are configured per API key, IP address, authenticated user, or combinations thereof. In AWS, API Gateway offers native rate limiting configurable per stage and method. For custom implementations, Redis is frequently used as a shared counter store across multiple service instances.

Key Use Cases

  • Protecting public APIs against abuse and brute force or automated scraping attacks
  • Differentiating service tiers by subscription plan (free: 100 req/min, pro: 1000 req/min)
  • Preventing cascading failures when a defective client floods a service with requests
  • Cost control in services consuming third-party APIs with per-call billing

Advantages and Considerations

Rate limiting protects service stability and ensures fair usage among all clients. It is especially critical in public APIs and multi-tenant services. The main consideration is choosing limits that do not penalize legitimate users. Response headers should communicate applicable limits, remaining requests, and time until renewal so clients can adapt their behavior.

Need help with product development?

We help you accelerate your development with cutting-edge technology and best practices.