Ad Click Aggregator

Real-time aggregation systems for ad click analytics data.

Last modified on March 12, 2026

Problem Statement & Constraints

Design a system to aggregate millions of ad click events in real-time to provide up-to-the-minute reporting for advertisers. The system must handle high-volume streams, filter out fraudulent or duplicate clicks, and ensure that click counts are accurate for billing purposes.

Functional Requirements

Aggregate clicks by ad ID and time window (e.g., 1 minute).
Detect and filter duplicate clicks.
Provide an API for real-time query results.

Non-Functional Requirements

Scale: 10 billion click events per day (peak 200k events/sec).
Latency: End-to-end data delay (event time to ingestion in report) < 1 minute.
Consistency: Exactly-once semantics for billing; hyper-accurate counts (probabilistic structures acceptable for pre-aggregation).
Availability: Robustness against regional outages or stream spikes.
Workload Profile:
- Read:Write ratio: ~20:80
- Peak throughput: 200k events/sec
- Retention: 90 days hot, 1y archive

High-Level Architecture

graph LR SDK --> Ingest Ingest --> Kafka Kafka --> Dedup Dedup --> Agg Agg --> OLAP[(OLAP)] OLAP --> Query Kafka --> Fraud Fraud -.->|flag| Dedup

Stateless HTTP/gRPC APIs ingest events into Kafka. A two-stage deduplication (Bloom filter + Redis) filters duplicates before stream engines (Flink/Spark) perform 1-minute tumbling window aggregations. Final counts map to an OLAP store (ClickHouse/Druid), while a parallel fraud sidecar feeds corrections back to the dedup stage.

Data Design

Kafka topics buffer high-throughput temporary streams, partitioned by ad ID to guarantee ordering. A columnar OLAP database tailored for real-time aggregations provides long-term reporting and sub-second roll-ups.

Message Stream (Kafka Topics)

Topic	Partition Key	Description	Retention
`raw_clicks`	`click_id`	Original event stream from SDK.	7 days
`ad_events`	`ad_id`	Deduplicated events for aggregation.	24 hours
`fraud_verdicts`	`ad_id`	ML flags for retroactive subtraction.	24 hours

Reporting Schema (OLAP - ClickHouse)

Table	Column	Type	Description
clicks_agg	`ad_id`	UInt32 (PK)	Unique advertisement ID.
	`window_ts`	DateTime (PK)	1-min window start timestamp.
	`click_count`	AggregateFunction	Rolling count for the window.
	`revenue`	Decimal	Sum of bid price for clicked ads.

Deep Dive & Trade-offs

Deep Dive

Exactly-once semantics: Kafka transactional producers and stream engine checkpointing ensure atomic cycles. Idempotent OLAP upserts guarantee end-to-end consistency.
Backpressure & flow control: Rate limits per advertiser throttle ingestion. Consumer lag thresholds trigger autoscaling to maintain a 1-minute freshness SLO.
Data reconciliation: Nightly batch jobs re-read raw events to validate real-time aggregates, generating adjustment records for discrepancies to ensure billing integrity.

Trade-offs

Bloom filter + Redis vs. Pure Redis: Two-stage is memory-efficient but allows rare duplicates; Pure Redis is exact but expensive in memory and I/O at 200k events/sec.
Tumbling vs. Sliding Windows: Tumbling is simpler and cheaper; Sliding provides smoother curves at the cost of higher CPU and state overhead.
Real-time vs. Batch Fraud Scoring: Real-time catches fraud early but adds latency/infra; Batch is simpler but results in temporary report inflation until correction runs.

Operational Excellence

SLIs / SLOs

SLO: 99% of click events are reflected in query results within 1 minute of event time.
SLO: 99.9% of query API requests return in < 500 ms.
SLIs: kafka_consumer_lag, aggregation_window_latency_p95, query_latency_p99, dedup_false_positive_rate, fraud_flag_rate.

Reliability & Resiliency

Verification: Replay historical Kafka topics to validate aggregation and late-event handling.
Chaos: Kill task managers mid-checkpoint to verify exactly-once recovery.
Load: Test ingestion at 2x peak traffic to validate backpressure and scaling.

Video Transcoding & Streaming Pipeline tag category
An inherently scalable video ingestion and transcoding system architecture; asynchronously chunking heavy media, extracting actionable features, and steadily outputting adaptive bitrates via worker pools.
Intervals & Constraints tag
A framework for balancing Latency (system Completeness) against Verification (data Integrity) by effectively choosing between Speculative execution and Pessimistic consensus intervals.
Real-Time Analytics Pipeline tag category
A highly scalable real-time streaming pipeline engineered to continuously ingest and process high-volume user event streams; gracefully handling late arrivals and robust fault tolerance protocols.
Data Pipelines tag
Architectural principles for reliable batch and streaming data pipelines; focusing on strict time semantics, exactly-once processing, optimal partitioning, observability, and reproducible states.