Home | About Us | Services | Schedule | How to Arrive

Multi-layer Network Recovery

Today's telecommunications networks generally contain multiple network layers, ranging from fiber physical layer, optical channel layer, TDM layer, MPLS, and IP layers. A network failure often occurs to the lowest fiber layer, e.g., a link failure. A network failure directly impacts the connectivity between a node pair in the fiber layer. The failure impacts all the traffic demands in the client layers that use the cut fiber for service transmission. As an example, all the lightpath channels that pass the fiber would be interrupted. Likewise, all the TDM tributaries included in each of the affected lightpath channel can be interrupted as well. Subsequently, a TDM flow can also contain multiple MPLS traffic flows of which each further includes multiple IP lows. All these flows are also interrupted by the fiber cut.

To survive from the failure, each layer has its own protection and restoration mechanisms. They can independently survive from network failures. As an example, in the fiber layer, we may find an alternative fiber route to reroute all the lightpath channels interrupted by the cut of a fiber. Likewise, in the optical channel layer, for each end-to-end optical channel, we may find a link disjoint end-to-end route to set up a new lightpath. The same operations can be taken for the other upper layers for failure recovery.

The key difference between failure recoveries in these layers is that different layers bear different recovery complicities and restoration speeds. In general, the lower a layer is, the simpler a recovery action is required. In the fiber layer, only one alternative fiber route should be found and used for recovery. In the optical channel layer, recovery actions should be taken in an independent way for each of the interrupted end-to-end lightpath channels, which can have a total of more than 80 operations if a fiber carries more than eighty wavelengths. The recovery actions are further re-produced in the upper layers such as SDH/SONET and MPLS layers, since in these layers there are much more affected service flows that have smaller granularities.

Another essential issue that is key to multiple layer network restoration is known as restoration escalation. As each of the network layers has their own restoration capabilities, it is sufficient to survive from a failure purely in one of the network layers if enough protection capacity is reserved in the layer. As an example, we can carry out network restoration in the fiber layer. We need only to find another fiber route to reroute all the affected optical channels. If the failure recovery is fast enough, the upper layers would even not notice the recovery process. As another extreme situation, we can survive from the failure within the IP layer through an IP table converging process with the assistance of routing protocol such as OSPF. Because there can be thousands of IP traffic flows that are interrupted by a single fiber cut, the recovery process can result in a heavy burden to the IP layer control plane. The recovery within the IP layer usually requires a much longer time.

For failure restoration of a network with multiple layers, we often need to determine the layer that takes the major failure restoration actions. And all the other layers just help this major layer when some failures cannot be fully recovered in the layer. For example, if we assign the optical layer as the major layer for failure recovery. It will recover most optical channel failures due to a link failure. If there are any optical channels not recovered, then we can employ the failure recovery mechanism in the SDH/SONET layer to recover all the SDH/SONET flows carried by the lightpaths that are not recovered by the optical channel layer yet.

In summary, network failure recovery in different layers have the following key characteristics: i) failure restoration in lower layers is in general faster and simpler than that in an upper layer, ii) failure restoration in upper layers is usually more efficient in spare capacity utilization and achieves better recovery percentage due to finer traffic flow granularities. To fulfill the best network failure recovery, all the network layers should collaborate for the fastest and highest failure recovery.

By: Daking Penny