Sale-Day Scaling: How We Handle 100x Traffic Spikes

Sale days separate eCommerce platforms that survive from those that show up on Twitter as a meme. The good news: handling 100x normal traffic is solved engineering. It just needs to be done early enough, with the right patterns. Here's the playbook.

Key takeaways

Cache aggressively. Most reads should never touch your origin.
Queue everything that can be async, orders, emails, recommendations.
Pre-warm. Auto-scaling reacts too slowly for instant spikes.
Have graceful degradation. When something breaks, degrade gracefully, don't fail catastrophically.
Rehearse. Run load tests at 2x your expected peak before sale day.

Why this matters

A 4-hour outage during a sale day costs not just revenue but trust. Customers who couldn't check out during BBD often don't come back during normal traffic.

The architecture patterns

Cache everything readable

Product catalogs, category pages, search results, all cached aggressively at CDN level. Stale-while-revalidate patterns serve cached responses while fresher data updates in the background.

Queue order processing

Don't process the order synchronously. Accept the order to a fast write queue, return success, process asynchronously. The customer's checkout completes in 200ms instead of 2s.

Database read replicas

Read traffic is 50-100× write traffic. Multiple read replicas handle it; the primary handles writes only.

Inventory as a separate fast store

Inventory checks happen on every product view. Use Redis or DynamoDB for sub-millisecond reads. Reconcile with the source of truth asynchronously.

Pre-warm at scale

Auto-scaling reacts to load with a 30-60s delay. For instant spikes, you need to be pre-warmed at sale-start scale. Schedule the warm-up.

CDN cache poisoning prevention

Vary cache keys carefully, segment cache by region, by logged-in vs not, by A/B group. Bad cache keys lead to wrong content served to wrong users at scale.

The operational patterns

Feature flags for graceful degradation

Recommendations down? Disable the recommendation widget. Search slow? Cache more aggressively. Reviews offline? Hide the section. Each feature should have an off switch.

Real-time observability

Dashboards showing the top 5 metrics that matter: orders/min, checkout completion %, error rate, latency, queue depth. Alerts on anomalies.

War room

On sale day, have engineering present, dashboards on big screens, fast communication channel. Most fires get put out in minutes if caught early.

Post-mortem the next day

What worked, what didn't, what to fix before next sale. Every sale day is the rehearsal for the next.

Common pitfalls

Trusting auto-scaling alone. It scales up too late for instant peaks.

Single-region. A single AZ outage during a sale is catastrophic. Multi-AZ minimum; multi-region for big businesses.

No graceful degradation. When a service fails, the whole checkout fails. Build feature flags.

Database hot rows. Inventory of the most-purchased product becomes a hot row. Cache and async-update.

Run a load test at 2x your expected peak two weeks before sale day. Find what breaks. Fix it. Run again. The teams who don't rehearse are the ones who melt.

FAQs

What about CDN choice? Cloudflare, CloudFront, Fastly all handle 100x. Differences are pricing and feature set.

How much does sale-day infrastructure cost? Typically 3-5x normal infrastructure spend for the 24-48h of peak.

Can we just hire AWS to scale us? No. The architecture has to support scale; AWS just provides capacity.

Talk to Techpuvi about scaling for sale days.

Sale-Day Scaling: How We Handle 100x Traffic Spikes

Sale-Day Scaling: How We Handle 100x Traffic Spikes

Key takeaways

Why this matters

The architecture patterns

Cache everything readable

Queue order processing

Database read replicas

Inventory as a separate fast store

Pre-warm at scale

CDN cache poisoning prevention

The operational patterns

Feature flags for graceful degradation

Real-time observability

War room

Post-mortem the next day

Common pitfalls

What we recommend

FAQs

Your privacy, your call.