Microsoft Azure Outage wipes out Teams, 365 and Outlook | Knowledge of the data center

Microsoft logo on a black background.

Microsoft experienced disruptions yesterday to its online services, including Teams, M365 and Outlook, according to Bloomberg news.

This comes after Microsoft’s positive earnings reports on Tuesday, but contrasts with the company’s announcement of a 5% workforce reduction, putting 10,000 of its workers out of a job. The layoffs included members of the company’s revenue growth engine Azure, which is Microsoft’s cloud services offering. Notably, while Azure is a growth driver for Microsoft, growth in the cloud services industry has slowed, indicating a maturing of the cloud services industry.

Azure is at the center of Tuesday’s outage, and Microsoft continued its history of revealing the root cause of outages by providing a summary of the impact of its Azure Status History place The multi-region outage lasted three hours and affected Azure resources in public Azure regions. Popular services M365 and PowerBI were also affected.

Wide area network (WAN) issues were the cause of the outage, according to Microsoft’s disclosures on the matter. A change the company made to its WAN cut connectivity between the Internet and Microsoft’s suite of core services.

The US Federal Aviation Administration (FAA) also experienced an outage of its critical pilot safety notification system, also known as NOTAM, last week. And their outages were due to system changes. According to the FAA, the outage was caused by a damaged file in both its primary and secondary databases. When a contractor deleted these files, the system slowed down and NOTAM alerts were unavailable to pilots, grounding domestic flights in the US.

Outages remain a critical drawback to our growing reliance on cloud service providers and, in the FAA’s case, legacy systems.

Although the two disruptions vary in origin, widespread impact is a common feature of these and all disruptions in major organizations. The financial impact of system outages, regardless of source, cannot be overstated. The Uptime Institute found that outages costing businesses more than $100,000 rose to more than 60% of all connectivity failures (up from 39% in 2019). And more companies are paying more than $1 million to survive the after-effects of an outage, with the number of companies paying seven figures increasing to 15%, up from 11% in previous years.

Data Center Outage Math

Azure is the second-largest cloud service provider (CSP), according to reports, behind only creator and market leader in the CSP segment Amazon.

Microsoft is committed to providing a full root cause analysis or post-incident report (PIR) within the next three days and then a final PIR 14 days later.

We spoke with Chip Gibbons, CISO of managed services company Thrive, to find out about mitigation plans after the outage. Here are the highlights:

Planning is essential for businesses of all sizes – many businesses can take advantage of a comprehensive data backup and recovery plan relatively easily. Larger organizations may require more detail to address, specifically how systems should be recovered, as well as applications and working conditions. However, some aspects of data recovery should always be addressed, such as understanding how a backup system works, who is responsible for it, what is the objective of the responsible recovery point (RPO ) and the amount of data you need to back up. . This can dramatically reduce the time it takes to return to business after a disaster to help you meet your specified recovery time objective (RTO). Routine testing of DR strategies: Testing is essential, but it can interfere with your business operations and even reduce productivity. Whenever systems are tested, IT teams will have to find something wrong with the DR strategy and should adapt it over time as you address these issues. If these issues are properly addressed during the testing phase, organizations will have a better chance when they need to actually use a DR strategy. Remember that IT infrastructure is governed by people; therefore, a DR strategy must take human behavior into account. For example, if a company location is compromised by a disaster, organizations need to see if they can get employees to access the data they need to do their jobs effectively.

Keep checking this space for updates on this emerging story.



Source link

You May Also Like

Leave a Reply

Your email address will not be published. Required fields are marked *