Förderjahr 2024 / Stipendium Call #19 / ProjektID: 7314 / Projekt: Turning Failure into Knowledge: Extending the Observable Internet by Leveraging Delivery Failures
Delivery failures occur both in the postal system and on the Internet when packets cannot be forwarded to their destination. Active Internet measurements probe Internet-connected systems by sending packets and observing their responses. In practice, not all probes reach their target. When delivery fails, an error message may be returned indicating the cause of the failure, such as a nonexistent address or an unreachable system. Despite their diagnostic value, Internet measurements traditionally focus on successful delivery. As the Internet continues to evolve, including the transition to IPv6 and the deployment of new protocols, new measurement methods are required to observe previously unmeasured parts of the Internet.
The third publication in my thesis shows how delivery failures can be systematically leveraged to detect Internet outages.
Widespread Standard Adoption
As a consequence of the continuous growth of the Internet, more and more devices are connected to it. These devices can also be reached over the Internet. By either probing the device or listening to network traffic from that device, you can infer the connectivity of the network and, in the case of non-responses, possible disruptions.
Two protocols play a major role for this use case, as ICMP probes are non-invasive and BGP updates can be passively monitored. ICMP, and also its successor ICMPv6 for devices connected over IPv6, show widespread standard support, ranging from routers and servers to end-user equipment such as mobile phones or IoT devices. This is especially interesting for detecting regional outages.
Detecting Internet Disruptions
Internet disruptions happen frequently, due to natural disasters, government-directed censorship measures, or as a consequence of kinetic warfare, for example due to power disruptions and damage to network infrastructure.
We started measuring the Ukrainian Internet on the seventh day of the invasion in 2022. In this blog post, I describe how these Internet outages can be detected remotely and explain how active and passive outage signals are collected. For more details, the paper is available on my stipend page with the title: “Tracking Internet Disruptions in Ukraine: Insights from Three Years of Active Full Block Scans”.
In the most severe case, the core infrastructure of a network goes offline. This is the infrastructure that is best protected against power outages. Internet service providers in Ukraine are now obligated to keep their core infrastructure running for ten consecutive days without electricity. What happens when core infrastructure goes offline, and how is this visible on the Internet? We can detect this event by either actively sending packets to that Internet service provider or by passively listening to traffic originating from that provider.
Passive Detection Signals
The comprehensiveness of passively listening for Internet outages is restricted by the observation point. For example, Cloudflare resides in a very central position and receives traffic from almost all Autonomous Systems. Autonomous Systems, or ASes, are organizational units that are responsible for exchanging Internet routing information.
Another more open signal can be derived from BGP: the number of routed prefixes per origin Autonomous System. This requires platforms such as RIPE RIS or RouteViews to share the entries of their routing tables. Otherwise, it would require operating a BGP router yourself. In our work, we relied on a bi-hourly table dump from the RouteViews platform. From the table dump, we compute the number of routed /24s per AS.
How is an outage visible in BGP? Two BGP neighbors exchange keep-alive messages. If one neighbor does not receive such a message within the hold timer, typically around 180 seconds, it will inform all its other BGP neighbors that the target network cannot be reached through it anymore.
Not all network outages lead to the border router going offline, as the core infrastructure is the part best protected against outages. Thus, BGP outages usually represent more severe outages. Only a smaller share of outages in Ukraine are visible as BGP outages. Partial outages, such as when end devices or server infrastructure go offline, require active detection signals.
Active Detection Signals
Active signals do not require traffic from the target network. They work by pinging the individual IP addresses of the target network. For more confidence in outages, IPs are aggregated to /24 networks, and only if there is no single IP address left responding is the block marked as gone dark.
This is then evaluated across all /24 blocks for the target AS by comparing the current value with the moving average of the previous week. An anomaly is detected if the number reaches a certain threshold, from more than 5% to more than 20% of blocks gone dark, depending on the level of granularity. The stricter the threshold, the more IPs are in the set, ranging from country level to region level and AS level.
This way, we detected periods where the Ukrainian Internet infrastructure was put under significant strain, visible as accumulated partial outages across oblasts. These time periods correlate with periods where Russia intensified its attacks on the Ukrainian energy infrastructure, for example during winter 2022/23 and throughout 2024.
This highlights that disruptions to the power grid and damage to network infrastructure can be detected over the Internet by monitoring just two protocols: ICMP and BGP.
Florian Holzbauer
My research focuses on tracking protocol adoption and security, the transition to IPv6, and leveraging active measurements to detect Internet outages.
To make these measurements accessible to everyone, we run www.email-security-scans.org—a platform designed to rank email servers and providers for improved transparency and security.