πWhen Data Centre Best Practice clashes with Telecommunications Poor Practice π
Discover the Real Cause Behind Recent Telecommunications Outages in South Africa
Lately in South Africa there have been a rash of telecommunications outages where the immediate causation has been attributed to Data Centre electrical maintenance. This has happened at Data Centres in Durban, Cape Town and Johannesburg. There have been multiple failures across multiple network operators including Openserve and Dark Fibre Africa. This has triggered downstream failures across all ISPs.
In this article from myself, I highlight the process that a rack owner in a Data Centre should undertake to address single points of failure in their services.
Spiderman
The root causation in the current telecommunications outages is poor telecommunications practices and not anything related to the Data Centres themselves. This hasn't stopped the telecommunications players attempting a Spiderman, i.e. point the finger away from themselves. In essence, what happened is that although the responsibility for downstream power management is clearly the responsibility of the rack owner who needs to ensure that the power draw remains within limits for each feed supplied from the Data Centre, poor practices were present. If we investigate the scenario above where a rack owned by a telecommunications player has various servers, switches and routers on dual power paths, but they running on the limits or near their limits for each separate power feed we will deduce an obvious problem. When a power feed goes down (as in the case of maintenance), the other power feed will be over the limits resulting in a trip. The causation in this case is thus squarely the responsibility of the rack owner which is the telecommunications operator.
Fault Tolerance and Redundancy
For failures to occurred within a Data Centre, the systems must have had shortcomings in built-in redundancy, fault tolerance, and instantaneous failover capabilities, and should an outage occur then they likely were not properly tested. Fault tolerance suggests that the capabilities of an Data Centre, including the power, cooling and associated supplementary systems, or part thereof, will continue to operate uninterrupted even if a component fails. Redundancy refers to backup networks or devices within the infrastructure that allow systems to keep operating when something fails. Failover denotes the ability to switch to this backup system instantly and seamlessly.
Data Centre Electrical Best Practice
The following article provides a comprehensive insight into Data Centre Maintenance including its importance:
Specifically Data Centre Electrical Maintenance is described as:
Electrical systems maintenance in a data center involves the inspection, cleaning, and servicing of key components. This ensures an uninterrupted, stable, and efficient power supply to the facility. The main components requiring regular maintenance include Uninterruptible Power Supply (UPS) systems, Power Distribution Units (PDUs), backup generators, transformers, switchgear, and switchboards.
But a key best practice that is conducted during Data Centre Electrical Maintenance is testing. It stands to reason that should a component not operate correctly it would be preferable to identify it during a scheduled maintenance window and not during an actual event. An example is an automatic transfer switch with automates the feed from utility power to generator power. During high periods of loadshedding this is in operational use continiously and at peak many times a day. However, as in the case of the past month in South Africa there has been no loadshedding for over a month, so it needs to be tested during a maintenance window.
Where in the world is telecommunications best practice?
The only lesson learnt during these outages is that some telecommunications operators do not test and live on a wing and a prayer. But fear not, as Fusion Broadband South Africa has you back with an innovative solution to provide a SD-WAN solution that leverages multiple last mile telecommunications links to ensure zero down time.
Ronald Bartels ensures that Internet inhabiting things are connected reliably online at Fusion Broadband South Africa - the leading specialized SD-WAN provider in South Africa. π Contact Fusion