Global CDN Media Serving

A distributed CDN architecture for global media delivery.

Problem Statement & Constraints

Develop a global media serving system that efficiently delivers static and dynamic assets worldwide using a content delivery network, while handling background processing for user uploads. The architecture must optimize for low latency, high availability, and cost-effectiveness, ensuring secure and reliable access to media content across diverse geographic regions.

Functional Requirements

  • Serve static and media assets globally.
  • Support background processing and transcoding for uploads.
  • Provide signed URLs and access control for private content.

Non-Functional Requirements

  • Scale: 1M requests/sec, global distribution; multi-region deployment.
  • Availability: 99.99% uptime with multi-CDN failover.
  • Consistency: Eventual consistency for media updates.
  • Latency: P99 < 100ms to edge; P99 < 500ms origin.
  • Workload Profile:
    • Read:Write ratio: ~98:2
    • Peak throughput: 1M requests/sec
    • Retention: indefinite (hot); archive to cold storage after 1y

High-Level Architecture

graph LR Users --> CDN CDN -->|miss| Shield Shield -->|miss| Origin Origin --> Storage Uploader --> Transcoder Transcoder --> Storage Transcoder --> CDN

Users access media via a global CDN edge. Cache misses route through a regional Origin Shield to collapse concurrent requests before hitting the Origin storage. Concurrently, an upload pipeline transcodes raw files into optimized renditions and pre-warms the CDN edges.

Data Design

Object storage manages raw uploads, transcoded media, and access logs. The CDN layer defines cache key patterns and validation lifetimes, using surrogate keys for bulk invalidations.

Object Storage Layout (S3)

BucketPrefix / PathRetentionDescription
raw-uploadsuser_id/YYYY-MM-DD/30 daysOriginal untouched files.
media-assetsasset_id/rendition/IndefinitePost-transcoding optimal variants.
static-logscdn/pop_id/HH_MM/90 daysAggregated edge access logs.

Cache Key & Logic (CDN)

ItemCache Key PatternTTL (Default)Invalidation Tag
Imageshost/path?w=100&q=8030 daysimg:<asset_id>
Videoshost/path/playlist.m3u81 yearvid:<asset_id>
Manifestshost/config.json60 secondsconfig:global

Deep Dive & Trade-offs

Deep Dive

  • Multi-tier caching: A regional Origin Shield (L2 cache) collapses concurrent edge misses, protecting the origin from viral traffic.

  • Invalidation strategy: Surrogate keys allow bulk purging of related variants, while short TTLs use stale-while-revalidate for metadata.

  • Security & Access: Time-limited signed URLs and rotatable keys provide zero-downtime secure delivery at the edge.

  • Content Optimization: Real-time resizing and Accept header negotiation deliver modern, optimal formats (WebP/AVIF).

  • Multi-CDN Failover: DNS-based anycast routing and health probes ensure automatic failover during provider degradation.

Trade-offs

  • Origin Shield vs. Direct: Shield reduces load significantly but adds a latency hop for cold-cache requests; direct is faster for misses but risks origin collapse.

  • On-the-fly vs. Pre-generation: Transforms save storage but increase edge CPU and latency; Pre-generation is faster to serve but increases storage costs.

  • Multi-CDN vs. Single Provider: Multi-CDN increases resilience but doubles configuration overhead and complicates cache invalidation sync.

Operational Excellence

SLIs / SLOs

  • SLO: 99.99% of media requests served successfully (2xx/3xx) from edge or origin.
  • SLO: P99 latency < 100 ms for cached content, < 500 ms for cache misses through origin shield.
  • SLIs: cache_hit_ratio, origin_request_rate, edge_latency_p99, upload_success_rate, transcoding_duration_p95.

Reliability & Resiliency

  • Synthetic: Global probes to measure edge latency and multi-region availability.
  • Failover: Regular multi-CDN failover drills to validate DNS switchover.
  • Load: Test upload-to-delivery pipeline at 10x normal traffic.