Cloudflare has released a detailed incident report confirming that the 62-minute global outage of its popular 1.1.1.1 DNS resolver service on July 14, 2025, was caused by an internal configuration error, not a BGP attack as initially speculated.
The outage affected millions of users worldwide from 21:52 UTC to 22:54 UTC, rendering most internet services inaccessible for those relying on the free DNS resolver.
The incident originated from a seemingly innocuous configuration change made on June 6, 2025, during preparations for a future Data Localization Suite (DLS) service.
Engineers inadvertently included prefixes associated with the 1.1.1.1 resolver service alongside prefixes intended for the new DLS service.
This misconfiguration remained dormant in the production network for over a month because the new DLS service was not yet active, causing no immediate impact or alerts.
The crisis was triggered on July 14 when a second configuration change was made to attach a test location to the non-production DLS service.
Although the test location itself was not live, this change triggered a global refresh of network configuration, causing the 1.1.1.1 resolver prefixes to be withdrawn from production Cloudflare data centers worldwide.
The service topology for the 1.1.1.1 resolver was inadvertently reduced from all locations to a single offline location, effectively making the service unreachable.
During the outage, traffic to multiple IP address ranges was impacted, including 1.1.1.0/24, 1.0.0.0/24, and several IPv6 ranges.
Interestingly, DNS-over-HTTPS (DoH) traffic remained relatively stable because most DoH users access the resolver through the domain cloudflare-dns.com rather than by IP address.
Cloudflare’s investigation revealed that the outage exposed fundamental weaknesses in their legacy routing management systems.
The company currently operates a combination of legacy and strategic systems that must be synchronized to manage IP ranges and determine where addresses should be announced across their edge network.
Initial alerts were triggered for the DNS Resolver at 22:01, indicating query, proxy, and data center failures
The legacy approach of hard-coding explicit lists of data center locations and attaching them to particular prefixes has proven error-prone and lacks progressive deployment methodology.
Adding confusion to the incident, a legitimate BGP hijack of the 1.1.1.0/24 prefix by Tata Communications India (AS4755) was discovered during the outage.
However, Cloudflare emphasized that this hijack was not the cause of the service failure but rather an unrelated issue that became visible when Cloudflare withdrew their routes.
The company is following up with Tata Communications regarding this separate incident.
According to Report, Cloudflare began restoring service at 22:20 UTC by reverting to the previous configuration, which immediately restored traffic levels to approximately 77% of pre-incident levels.
The remaining 23% of edge servers required individual reconfiguration to restore IP bindings, a process that was accelerated given the severity of the incident. Full service restoration was achieved by 22:54 UTC.
To prevent similar incidents, Cloudflare announced several remediation measures, including accelerating the deprecation of legacy systems that lack gradual deployment methodologies.
The company plans to migrate to modern progressive deployment processes that provide earlier indication through staged rollouts and health monitoring.
Additionally, they will implement staging addressing deployments to improve test coverage and documentation standards.
Cloudflare has apologized for the disruption and emphasized their commitment to implementing these improvements to ensure enhanced stability and prevent future occurrences of similar configuration errors.
Find this Story Interesting! Follow us on LinkedIn and X to Get More Instant Updates.
PortSwigger has leveled up Burp Suite's scanning arsenal with the latest Active Scan++ extension, version…
Unit 42 researchers at Palo Alto Networks exposed serious flaws in the Model Context Protocol…
Polish police have arrested three Ukrainian men traveling through Europe and seized a cache of…
Google has launched its most significant Chrome update ever, embedding Gemini AI across the browser…
Attackers exploit this vulnerability through the router's web interface components, specifically "cgibin" and "hnap_main," by…
Security researchers have uncovered a severe flaw in Apache Tika, a popular open-source toolkit for…