Tech News is a blog created by Wasim Akhtar to deliver Technical news with the latest and greatest in the world of technology. We provide content in the form of articles, videos, and product reviews.
Today’s outage for several Google services
Earlier today, most Google users who use logged-in services like Gmail, Google+, Calendar and Documents found they were unable to access those services for approximately 25 minutes. For about 10 percent of users, the problem persisted for as much as 30 minutes longer. Whether the effect was brief or lasted the better part of an hour, please accept our apologies—we strive to make all of Google’s services available and fast for you, all the time, and we missed the mark today.
The issue has been resolved, and we’re now focused on correcting the bug that caused the outage, as well as putting more checks and monitors in place to ensure that this kind of problem doesn’t happen again. If you’re interested in the technical explanation for what occurred and how it was fixed, read on.
At 10:55 a.m. PST this morning, an internal system that generates configurations—essentially, information that tells other systems how to behave—encountered a software bug and generated an incorrect configuration. The incorrect configuration was sent to live services over the next 15 minutes, caused users’ requests for their data to be ignored, and those services, in turn, generated errors. Users began seeing these errors on affected services at 11:02 a.m., and at that time our internal monitoring alerted Google’s Site Reliability Team. Engineers were still debugging 12 minutes later when the same system, having automatically cleared the original error, generated a new correct configuration at 11:14 a.m. and began sending it; errors subsided rapidly starting at this time. By 11:30 a.m. the correct configuration was live everywhere and almost all users’ service was restored.
With services once again working normally, our work is now focused on (a) removing the source of failure that caused today’s outage, and (b) speeding up recovery when a problem does occur. We'll be taking the following steps in the next few days:
1. Correcting the bug in the configuration generator to prevent recurrence, and auditing all other critical configuration generation systems to ensure they do not contain a similar bug.
2. Adding additional input validation checks for configurations, so that a bad configuration generated in the future will not result in service disruption.
3. Adding additional targeted monitoring to more quickly detect and diagnose the cause of service failure.
Posted by Ben Treynor, VP Engineering![](https://lh3.googleusercontent.com/blogger_img_proxy/AEn0k_t8sKfz9979qE_QFAV8T3qhfgzvR4wJGbpkygCSHpWjrYwD8d-41pymESXTqo7MI3pnPVFNmZkiKg=s0-d)
via The Official Google Blog http://ift.tt/1mEzh4w
The issue has been resolved, and we’re now focused on correcting the bug that caused the outage, as well as putting more checks and monitors in place to ensure that this kind of problem doesn’t happen again. If you’re interested in the technical explanation for what occurred and how it was fixed, read on.
At 10:55 a.m. PST this morning, an internal system that generates configurations—essentially, information that tells other systems how to behave—encountered a software bug and generated an incorrect configuration. The incorrect configuration was sent to live services over the next 15 minutes, caused users’ requests for their data to be ignored, and those services, in turn, generated errors. Users began seeing these errors on affected services at 11:02 a.m., and at that time our internal monitoring alerted Google’s Site Reliability Team. Engineers were still debugging 12 minutes later when the same system, having automatically cleared the original error, generated a new correct configuration at 11:14 a.m. and began sending it; errors subsided rapidly starting at this time. By 11:30 a.m. the correct configuration was live everywhere and almost all users’ service was restored.
With services once again working normally, our work is now focused on (a) removing the source of failure that caused today’s outage, and (b) speeding up recovery when a problem does occur. We'll be taking the following steps in the next few days:
1. Correcting the bug in the configuration generator to prevent recurrence, and auditing all other critical configuration generation systems to ensure they do not contain a similar bug.
2. Adding additional input validation checks for configurations, so that a bad configuration generated in the future will not result in service disruption.
3. Adding additional targeted monitoring to more quickly detect and diagnose the cause of service failure.
Posted by Ben Treynor, VP Engineering
via The Official Google Blog http://ift.tt/1mEzh4w
Netflix finally lets users disable Post-Play feature
As promised, Netflix now lets you turn off its forced auto-advancing feature when watching streaming TV shows. Huzzah, I say.
via PCWorld http://ift.tt/KSdMkq
via PCWorld http://ift.tt/KSdMkq
Subscribe to:
Posts (Atom)
Windows 11 Canary Insider Preview Builds Changelog – 2025
UPDATE: Addition of Windows 11 Insider Preview build 27788 released to Canary channel. This exclusive Windows 11 changelog article contains ...
-
UPDATE: Direct download links added for the latest Mozilla Firefox 131.0.2, 115.16.1 ESR and 128.3.1 ESR offline installers. NOTE: The downl...
-
Newer versions of Windows 11 come with a new security feature called “Windows Protected Print Mode (WPP)“. This article will help you in act...