|
||
|
||
And the people building internet infrastructure haven’t fully registered it yet.
In 2025, AI agent traffic grew 7,851% year over year. Automated traffic is now growing eight times faster than human traffic. By the end of the year, it was visible at scale across major digital properties—including checkout pages. Around 2.3% of checkout traffic was already being generated by agents completing transactions with no human in the loop.
That is not a rounding artifact from a tiny base. It is an early signal of a structural change.
When people talk about AI reshaping the network, they picture training clusters, GPU fabrics, and private fiber. That picture is real: training a large model requires thousands of GPUs exchanging data at 400Gbps to 1.6Tbps, with near-zero tolerance for packet loss. One congested link doesn’t slow one machine. It stalls the entire job.
But almost none of that traffic touches the public internet.
AI training happens behind closed doors, on private infrastructure owned by a handful of large players. It is one of the most demanding network workloads ever created, and it is completely invisible to the infrastructure the broader internet ecosystem actually builds and operates.
The real story isn’t training. It’s an inference.
Training wants deserts. Inference wants your city.
Training is what you do to build a model. Inference is what you do to use it—every query, every recommendation, every AI assistant response, every agentic workflow.
Training is latency-tolerant. You can run it in Wyoming, and nobody cares about the round trip. Inference is latency-sensitive in a way that’s genuinely new. For agentic AI—systems that chain dozens of steps to complete a task—latency compounds at every hop. A 50ms delay across a 20-step chain becomes a full second. That changes the experience. In some cases, it changes whether the workflow succeeds at all.
That difference determines where the compute has to live. And it has to live close to users.
According to McKinsey’s demand model, inference will overtake training as the dominant AI compute category by 2029, growing at 35% CAGR. Hyperscalers are already building two-tier infrastructure: large remote training campuses, and distributed inference nodes in metro data centers—close to internet exchanges, close to enterprise systems, close to the public internet.
The signs are already in the traffic data. DE-CIX cited AI workloads as a driver behind its record 79 exabytes globally in 2025. Frankfurt hit an all-time peak of 18.73 Tbps in December. This isn’t speculation. It’s early measurement.
Classic web patterns are familiar. Small requests, large responses, short sessions. Predictable.
AI inference doesn’t fit that model. Context windows now run to hundreds of thousands of tokens. Retrieval-augmented systems attach large documents to prompts before a model even starts responding. Both sides of the exchange are growing—and growing unpredictably.
Then agentic behavior adds another layer entirely.
A traditional crawler fetches a page and leaves. An agent doesn’t. It maintains state. It authenticates. It calls multiple services in sequence. It retries. It executes. From the network’s perspective, the traffic can look like a human session—but it runs at machine speed, at machine scale, 24 hours a day.
The network currently has no way to tell the difference.
That matters because latency is starting to play a different role. For a human watching a video, a 20ms variance is invisible. For an agent performing a multi-step workflow, that same variance accumulates at every step, affecting total transaction time, triggering retries, sometimes affecting whether the chain completes cleanly. At that point, latency isn’t just a performance metric. For certain AI workloads, it’s a correctness constraint.
BGP path selection wasn’t designed for this environment. That’s not a crisis today. But it is a serious design question and one the networking community should be asking before someone else answers it for them.
The internet was built by humans, for humans, to move human-generated requests to human-operated systems.
We are entering a period where that is no longer the dominant assumption. Models are calling models. Agents are transacting. Systems are coordinating at speeds no person can observe in real time.
The infrastructure community has a habit of seeing important shifts before the rest of the market. It saw CDN before CDN was common language. It saw the streaming surge before it hit capacity. It saw cloud concentration risk before regulators started talking about it.
The same signal is visible in AI traffic right now. It’s not a forecast—it’s a present-tense measurement.
The real question isn’t whether the network will need to adapt. It will.
The question is whether the people who actually understand the infrastructure will shape how that adaptation happens or whether they’ll be handed someone else’s solution and asked to make it work.
I know which one I’d bet on.
Sponsored byRadix
Sponsored byVerisign
Sponsored byIPv4.Global
Sponsored byVerisign
Sponsored byDNIB.com
Sponsored byCSC
Sponsored byWhoisXML API