Engineering•12 min read
Building a SERP Monitoring Pipeline That Actually Scales
Yuki Tanaka
Dec 5, 2024
The Naive Approach Breaks at 10K
A simple cron + worker pipeline works fine until you cross ~10K keywords/day. Past that, you need request batching, regional partitioning, and an exit-node scheduler that knows about Google's per-IP limits.
The Architecture
Producer (keyword scheduler) → Kafka → Worker pool with proxy-pool affinity → Parser → BigQuery sink. Idempotent retries on 429/503, dead-letter queue for permanent fails, and a metrics layer to spot regressions early.
Cost Reality
At 100M queries/month, proxy cost dominates. Hybrid routing (datacenter first → residential on retry) cut our blended cost-per-query by 38% versus residential-only.
Share this article