GitHub Experiences Global Outage, Disrupting Core Services for Users

GitHub experienced a significant service disruption affecting millions of developers worldwide on July 28-29, 2025, with core functionality including API requests, Issues, and Pull Requests experiencing degraded performance for approximately nine hours.

The incident, which began at 22:40 UTC on July 28, impacted roughly 4% of requests to affected services at its peak before being fully resolved at 02:06 UTC on July 29.

The outage highlighted the critical dependency of the global software development community on GitHub’s infrastructure, as developers across multiple time zones found themselves unable to access essential collaborative features during the prolonged service disruption.

The incident commenced at 22:40 UTC on July 28, 2025, when GitHub’s monitoring systems detected reports of degraded performance affecting three critical services: API Requests, Issues, and Pull Requests.

Within two minutes, at 22:42 UTC, GitHub engineering team officially acknowledged the problem and initiated their investigation protocol to determine the root cause and develop appropriate mitigation strategies.

The rapid response time demonstrated GitHub’s commitment to transparency, though the underlying technical challenges would prove more complex than initially anticipated.

During the early stages of the incident, GitHub’s status updates indicated that “some GitHub services are experiencing degraded performance” with the team actively investigating networking issues.

By 23:18 UTC, approximately one hour into the outage, the scope of the problem became clearer as engineers identified networking infrastructure as the primary suspect.

The investigation continued throughout the night, with technical teams working across multiple time zones to diagnose and address the underlying connectivity problems that were preventing users from accessing these fundamental development tools.

Mitigations

The most critical phase of the outage occurred between 00:02 UTC and 01:52 UTC on July 29, when approximately 4% of requests to the impacted services continued to experience errors despite ongoing mitigation efforts.

During this period, GitHub’s engineering teams deployed multiple diagnostic tools and implemented various networking fixes while maintaining continuous communication with affected users through their status page updates.

The persistent error rate indicated the complexity of the underlying infrastructure issue, requiring sophisticated troubleshooting approaches beyond standard restart procedures.

GitHub’s technical team worked systematically through their incident response protocols, focusing specifically on resolving the networking bottlenecks that were causing request failures across their distributed infrastructure.

The engineering effort involved coordinating between multiple technical teams responsible for different aspects of GitHub’s service architecture, including API gateway management, database connectivity, and load balancing systems.

Throughout this phase, the company maintained regular communication intervals, providing updates approximately every hour to keep the developer community informed about progress toward resolution.

Resolution and Service Recovery

The breakthrough in resolving the incident occurred at 01:52 UTC when GitHub’s team successfully deployed a comprehensive mitigation strategy targeting the identified networking issues.

Recovery began almost immediately, with Issues service returning to normal operation at 02:03 UTC, followed by Pull Requests at 02:05 UTC.

The systematic restoration of services demonstrated the effectiveness of GitHub’s incident response procedures and their ability to implement targeted fixes without causing additional service disruptions.

By 02:06 UTC on July 29, GitHub officially declared the incident resolved, marking the end of a nine-hour service disruption that had affected developers globally.

The company committed to conducting a detailed root cause analysis, promising to share findings with the community once the investigation is complete.

This transparency approach aligns with industry best practices for post-incident communication, helping users understand both the technical factors that contributed to the outage and the measures being implemented to prevent similar occurrences in the future.

Find this Story Interesting! Follow us on LinkedIn and X to Get More Instant Updates.

Ethan Brooks

Ethan Brooks is a Senior cybersecurity journalist passionate about threat intelligence and data privacy. His work highlights cyber attacks, hacking, security culture, and cybercrime with The Cyber News.