Traditional recovery methods

In traditional networks, recovering from link or switch failures is a slow and reactive process.

The control plane must detect the fault, update routing tables, and notify all switches in the network. This chain of actions can take significant time, delaying convergence and impacting performance.

  • Link Fault: Occurs when a connection between two remote switches fails.
  • Switch Fault: Happens when the leaf switch initiating packet flows encounters a failure.

While sufficient for legacy networks, these methods struggle to meet the speed and efficiency requirements of modern AI fabric environments.

Intelligent local recovery

Cisco Silicon One introduces autonomous local fault recovery, significantly reducing time-to-resolution.

When a local device link fails, the device's MAC layer notifies the Intelligent Load Balancing (ILB) block. The ILB dynamically adjusts traffic to avoid the failed link without requiring additional intervention.

This seamless recovery supports both dynamically load-balanced flows and static ECMP flows, maintaining network performance with minimal disruption.

Unreachable destination notification packet (UDNP)

Cisco Silicon One ensures fast convergence for remote link failures through the Unreachable Destination Notification Packet (UDNP).

When a packet reaches an unreachable destination due to a local link failure, the packet is truncated and sent back to the sender.

The sender reacts to this notification by re-pathing the flow, such as by adjusting the flow's entropy value. This enables rapid traffic rerouting and minimizes disruption.

Fabric routing management (FRM)

For remote link failures, traditional networks rely on the control plane to notify peer devices of lost reachability, a process that can take hundreds of milliseconds.

Cisco Silicon One's hardware-accelerated Fabric Routing Management (FRM) protocol eliminates this delay by propagating reachability updates directly in hardware.

This reduces convergence time to just a few microseconds, ensuring faster recovery and maintaining seamless network performance.

Intelligent bandwidth recovery

In scenarios where leaf and spine switches are connected by multiple links and only a subset of those links fails (e.g., 1 out of 4), overall bandwidth is reduced, but reachability is preserved.

Cisco Silicon One's hardware-accelerated routing protocol monitors available bandwidth and dynamically adjusts traffic distribution.

It ensures that only traffic proportional to the remaining bandwidth is sent to the remote spine, maintaining network balance and efficiency.