How Netflix Handles 500M Daily API Requests

Netflix serves over 220 million subscribers across 190 countries. On any given day, their API gateway processes upwards of 500 million requests.

Not 500 million database queries.
Not 500 million page loads.

500 million discrete API calls — device registrations, content metadata fetches, playback license requests, recommendation refreshes, billing checks.

At that scale, the question is never:

"How do we handle this request?"

The question becomes:

"How do we handle this request when three downstream services are degraded, one region is partially unavailable, and 40,000 devices wake up simultaneously because a popular show dropped at midnight?"

That is a very different engineering problem.

The Problem With a Single Gateway

Before we talk about what Netflix built, understand what they were trying to escape.

In 2008, Netflix ran a monolithic Java application. Every feature — streaming, billing, recommendations, device management — lived in one codebase, deployed together, scaled together, and failed together.

That created several problems:

A bad recommendation deployment could take down billing.
A memory leak in streaming metadata could crash device registration.
You couldn't scale one component independently.

The 2008 AWS database corruption incident accelerated Netflix's migration toward microservices.

After spending three days unable to ship DVDs due to cascading failures caused by a tightly coupled architecture, Netflix began decomposing its monolith.

By 2012, Netflix had split the system into hundreds of independent services.

They solved the coupling problem.

They created a routing problem.

What Zuul Actually Is

Zuul is Netflix’s edge service.

Every external request — from every Netflix app on every device — enters Netflix infrastructure through Zuul before touching anything else.

Think of it like the front door of an enormous building.

That front door has to:

Authenticate users
Decide routing destinations
Balance traffic
Enforce rate limits
Record metrics
Handle failures gracefully

All in milliseconds.

At hundreds of thousands of requests per second.

Zuul is built on top of Netty, a non-blocking I/O framework for Java.

Why does that matter?

Blocking I/O scales poorly because threads wait for responses.

Non-blocking I/O allows one thread to manage thousands of concurrent connections simultaneously.

High-Level Architecture

Client Device
│
▼
┌─────────────┐
│  Zuul Edge  │ ← authentication, rate limiting, routing
└─────────────┘
│
▼
┌─────────────┐
│   Ribbon    │ ← client-side load balancing
└─────────────┘
│
▼
┌─────────────────────────────────────────────┐
│ Microservices (hundreds of them)            │
│ Recommendations │ Streaming │ Billing ...   │
└─────────────────────────────────────────────┘

Zuul's Filter Chain

The core idea behind Zuul is its filter system.

Every request moves through a chain of filters.

Each filter owns one responsibility.

Filter Types

PRE Filters

Run before routing.

Used for:

Authentication
Rate limiting
Request metadata injection

ROUTING Filters

Handle forwarding requests to backend services.

This is where Ribbon participates.

POST Filters

Run after backend responses.

Used for:

Logging
Metrics
Response headers

ERROR Filters

Run whenever failures happen.

Authentication Pre-Filter Example

public class AuthenticationFilter extends ZuulFilter {
 
    @Override
    public String filterType() {
        return "pre";
    }
 
    @Override
    public int filterOrder() {
        return 1;
    }
 
    @Override
    public boolean shouldFilter() {
        return true;
    }
 
    @Override
    public Object run() {
        RequestContext ctx = RequestContext.getCurrentContext();
        HttpServletRequest request = ctx.getRequest();
 
        String token = request.getHeader("Authorization");
 
        if (!tokenService.isValid(token)) {
            ctx.setSendZuulResponse(false);
            ctx.setResponseStatusCode(401);
            return null;
        }
 
        ctx.addZuulRequestHeader(
            "X-User-Id",
            tokenService.getUserId(token)
        );
 
        return null;
    }
}

✦

Netflix open-sourced Zuul in 2013.

Zuul 2 (the Netty rewrite) followed in 2018.

The important takeaway isn't the library itself — it's the filter model. Separating cross-cutting concerns from routing logic scales beautifully.

Service Discovery With Eureka

Zuul knows it must call the Recommendations Service.

But which instance?

At Netflix scale, every service runs across dozens or hundreds of instances.

This becomes a service discovery problem.

Netflix solved it using Eureka.

Eureka is a service registry.

Each service registers itself at startup.

Example:

POST /eureka/apps/RECOMMENDATIONS
 
{
  "instance": {
    "hostName": "ec2-54-234-12-88.compute-1.amazonaws.com",
    "app": "RECOMMENDATIONS",
    "ipAddr": "54.234.12.88",
    "port": 8080,
    "status": "UP",
    "dataCenterInfo": {
      "name": "Amazon",
      "metadata": {
        "availability-zone": "us-east-1a"
      }
    }
  }
}

Each service instance sends heartbeats every 30 seconds.

If heartbeats stop:

Instance marked DOWN
Traffic stops routing to it

Clients keep a local registry cache.

That means:

Even if Eureka fails, traffic continues routing using previously known service state.

Netflix intentionally optimized for availability over consistency.

A stale registry is survivable.

No routing is catastrophic.

Client-Side Load Balancing — Ribbon

Traditional load balancing works like this:

Client → Load Balancer → Backend

Netflix instead uses client-side load balancing.

Meaning:

The client itself chooses which backend instance to call.

Ribbon uses:

Eureka service registry
Response time statistics
Zone awareness

Default strategy:

Zone-aware weighted round robin

Goals:

Prefer same AWS availability zone
Reduce latency
Reduce cross-AZ cost

Ribbon tracks:

Active requests
Average response time
Failure count

Slow instances receive less traffic instead of being immediately removed.

Simplified Ribbon Logic

public Server choose(Object key) {
    List<Server> servers = getReachableServers();
 
    double totalWeight = 0;
 
    for (Server server : servers) {
        ServerStats stats = serverStatsMap.get(server);
        double weight = 1.0 / stats.getResponseTimeAvg();
 
        totalWeight += weight;
        weightMap.put(server, weight);
    }
 
    double rand = Math.random() * totalWeight;
 
    for (Server server : servers) {
        rand -= weightMap.get(server);
 
        if (rand <= 0) {
            return server;
        }
    }
 
    return servers.get(0);
}

ℹ

Ribbon entered maintenance mode in 2021.

Today, similar ideas live in Spring Cloud LoadBalancer and modern service mesh systems.

Handling Failures — Hystrix

At 500M requests/day:

Failures are guaranteed.

The real problem:

Preventing cascading failures.

Without protection:

Recommendations fail
Requests wait 30s timeout
Threads pile up
Thread pool exhausts
Entire system degrades

Netflix solved this using the Circuit Breaker Pattern through Hystrix.

If unhealthy → reopen.

Hystrix Example

@HystrixCommand(
    fallbackMethod = "getDefaultRecommendations",
    commandProperties = {
        @HystrixProperty(
            name = "circuitBreaker.errorThresholdPercentage",
            value = "50"
        ),
        @HystrixProperty(
            name = "circuitBreaker.sleepWindowInMilliseconds",
            value = "5000"
        ),
        @HystrixProperty(
            name = "execution.isolation.thread.timeoutInMilliseconds",
            value = "1000"
        )
    }
)
public List<Movie> getRecommendations(String userId) {
    return recommendationsClient.fetch(userId);
}
 
public List<Movie> getDefaultRecommendations(String userId) {
    return cacheService.getLastKnownRecommendations(userId);
}

Fallbacks matter.

Netflix rarely shows failures.

Instead:

Cached recommendations
Popular titles
Graceful degradation

This principle is called:

Best available response

Rate Limiting at the Edge

Not all requests are legitimate.

Some come from:

Scrapers
Credential stuffing attacks
Misconfigured clients

Netflix handles this in Zuul.

Using:

Token Bucket Algorithm

How it works:

Clients receive tokens
Requests consume tokens
Tokens refill over time
Empty bucket → HTTP 429

Benefits:

Real users naturally burst traffic.

Example:

Opening Netflix app:

Device registration
User fetch
Homepage fetch
Recommendation refresh

Then traffic stabilizes.

Different clients get different limits:

Mobile Apps → burst-friendly
Smart TVs → predictable limits
Developer APIs → strict quotas
Internal Services → separate rules

Regional Failover

Netflix runs across multiple AWS regions.

Example:

us-east-1
us-west-2
eu-west-1

Traffic normally routes geographically.

But if failures rise:

Traffic shifts using AWS Route 53 weighted routing.

Netflix practices failure regularly using:

Chaos Monkey

Randomly kills production instances.

Chaos Kong

Simulates regional outages.

Yes.

In production.

During business hours.

Why?

Because production is the only environment that truly matters.

⚠

Chaos engineering is dangerous without observability.

Before chaos testing, build metrics, tracing, dashboards, and alerting systems.

End-to-End Request Flow

Client sends request to netflix.com
│
├── AWS Route 53 → nearest healthy region
│
├── AWS ELB
│
├── Zuul Edge Node
│   ├── PRE: authenticate token
│   ├── PRE: rate limiting
│   ├── PRE: attach user context
│   └── PRE: canary/stable routing
│
├── Ribbon chooses backend
│
├── API Service
│   ├── Recommendations (Hystrix)
│   ├── Metadata (Hystrix)
│   ├── Personalization (Hystrix)
│   └── fallback if failures occur
│
├── Zuul POST Filters
│   ├── Logging
│   ├── Metrics
│   └── Response headers
│
└── Response returned

Typical latency targets:

P50 → under 100ms
P99 → under 500ms

What You Can Actually Use

Eureka

Service registry.

Good for:

Microservices
Availability-first systems

Alternatives:

Consul
etcd

Ribbon

Client-side load balancing.

Modern alternative:

Spring Cloud LoadBalancer

Hystrix

Circuit breaker.

Modern replacement:

Resilience4j

Zuul

API gateway.

Alternative:

Spring Cloud Gateway

The patterns matter more than the tools.

Borrow these ideas:

Circuit breakers
Graceful degradation
Timeouts
Fallbacks
Service discovery
Edge authentication
Smart routing

The Real Lesson

Netflix architecture is not impressive because it handles 500 million requests.

It is impressive because it handles them while things are failing.

Services crash.

Deployments happen.

Regions degrade.

Traffic spikes.

Chaos is intentional.

The system survives because it is designed for failure first.

Every service assumes failure.

Every timeout has a fallback.

Every fallback is tested.

That is the real engineering philosophy:

Build for failure first.

The happy path takes care of itself.

How Netflix Handles 500M Daily API Requests

The Problem With a Single Gateway

What Zuul Actually Is

High-Level Architecture

Zuul's Filter Chain

Filter Types

PRE Filters

ROUTING Filters

POST Filters

ERROR Filters

Authentication Pre-Filter Example

Service Discovery With Eureka

Client-Side Load Balancing — Ribbon

Simplified Ribbon Logic

Handling Failures — Hystrix

Circuit Breaker States

CLOSED

OPEN

HALF-OPEN

Hystrix Example

Rate Limiting at the Edge

Regional Failover

Chaos Monkey

Chaos Kong

End-to-End Request Flow

What You Can Actually Use

Eureka

Ribbon

Hystrix

Zuul

The Real Lesson