Ad Click Aggregator
Real-time aggregation systems for ad click analytics data.
Last modified on March 12, 2026
Problem Statement & Constraints
Design a system to aggregate millions of ad click events in real-time to provide up-to-the-minute reporting for advertisers. The system must handle high-volume streams, filter out fraudulent or duplicate clicks, and ensure that click counts are accurate for billing purposes.
Functional Requirements
- Aggregate clicks by ad ID and time window (e.g., 1 minute).
- Detect and filter duplicate clicks.
- Provide an API for real-time query results.
Non-Functional Requirements
- Scale: 10 billion click events per day (peak 200k events/sec).
- Latency: End-to-end data delay (event time to ingestion in report) < 1 minute.
- Consistency: Exactly-once semantics for billing; hyper-accurate counts (probabilistic structures acceptable for pre-aggregation).
- Availability: Robustness against regional outages or stream spikes.
- Workload Profile:
- Read:Write ratio: ~20:80
- Peak throughput: 200k events/sec
- Retention: 90 days hot, 1y archive
High-Level Architecture
Stateless HTTP/gRPC APIs ingest events into Kafka. A two-stage deduplication (Bloom filter + Redis) filters duplicates before stream engines (Flink/Spark) perform 1-minute tumbling window aggregations. Final counts map to an OLAP store (ClickHouse/Druid), while a parallel fraud sidecar feeds corrections back to the dedup stage.
Data Design
Kafka topics buffer high-throughput temporary streams, partitioned by ad ID to guarantee ordering. A columnar OLAP database tailored for real-time aggregations provides long-term reporting and sub-second roll-ups.
Message Stream (Kafka Topics)
| Topic | Partition Key | Description | Retention |
|---|---|---|---|
raw_clicks | click_id | Original event stream from SDK. | 7 days |
ad_events | ad_id | Deduplicated events for aggregation. | 24 hours |
fraud_verdicts | ad_id | ML flags for retroactive subtraction. | 24 hours |
Reporting Schema (OLAP - ClickHouse)
| Table | Column | Type | Description |
|---|---|---|---|
| clicks_agg | ad_id | UInt32 (PK) | Unique advertisement ID. |
window_ts | DateTime (PK) | 1-min window start timestamp. | |
click_count | AggregateFunction | Rolling count for the window. | |
revenue | Decimal | Sum of bid price for clicked ads. |
Deep Dive & Trade-offs
Deep Dive
Exactly-once semantics: Kafka transactional producers and stream engine checkpointing ensure atomic cycles. Idempotent OLAP upserts guarantee end-to-end consistency.
Backpressure & flow control: Rate limits per advertiser throttle ingestion. Consumer lag thresholds trigger autoscaling to maintain a 1-minute freshness SLO.
Data reconciliation: Nightly batch jobs re-read raw events to validate real-time aggregates, generating adjustment records for discrepancies to ensure billing integrity.
Trade-offs
Bloom filter + Redis vs. Pure Redis: Two-stage is memory-efficient but allows rare duplicates; Pure Redis is exact but expensive in memory and I/O at 200k events/sec.
Tumbling vs. Sliding Windows: Tumbling is simpler and cheaper; Sliding provides smoother curves at the cost of higher CPU and state overhead.
Real-time vs. Batch Fraud Scoring: Real-time catches fraud early but adds latency/infra; Batch is simpler but results in temporary report inflation until correction runs.
Operational Excellence
SLIs / SLOs
- SLO: 99% of click events are reflected in query results within 1 minute of event time.
- SLO: 99.9% of query API requests return in < 500 ms.
- SLIs: kafka_consumer_lag, aggregation_window_latency_p95, query_latency_p99, dedup_false_positive_rate, fraud_flag_rate.
Reliability & Resiliency
- Verification: Replay historical Kafka topics to validate aggregation and late-event handling.
- Chaos: Kill task managers mid-checkpoint to verify exactly-once recovery.
- Load: Test ingestion at 2x peak traffic to validate backpressure and scaling.