In this article, I’d like to show you guys, how you can achieve sub-second convergence in MPLS networks with MPLS TE (RSVP) fast-reroute (FRR). Since this sub-second convergence depends on hardware optimization, I’ll use DATACOM Ethernet Switches 1. In this topology, I’ll simulate a network event failure in the RSVP core, with fast-reroute enabled and then disabled, just so we have some numbers to compare the convergence time in both situations.
The MPLS network topology is illustrated in Figure 1. In summary, this is a classic MPLS core running RSVP and I’ve configured L2VPN Martini (VPWS) between
DM4001_207 (SW_207) and
DM4001_194 (SW_194). The primary path of both LSPs between these two PEs goes through
DM4004_196 (SW_196) and
fast-reroute protection is requested downstream. As you can see in Figure 1, I also drew how detour tunnels have been established to protected the primary path. Essentially, since there are multiple alternate paths, the primary path is protected entirely in both directions i.e.,
SW_207 <=> SW_194.
Basically, fast-reroute is a
one-to-one protection mechanism with pre-computed alternate LSPs (detour tunnels) to protect the downstream next-hop (link/node) of the primary path. If you need more technical details check out RFC-4090 2.
I’ll highlight some major details of both
DM4001_194 configuration related to MPLS TE (RSVP). As you can see bellow,
mpls traff-eng was enabled and all L3/MPLS VLANs are running
tunnel mpls traffic-eng fast-reroute one-to-one is the command that requests fast-reroute protection. Note that
VPWS VPN 1000 is associated with the RSVP LSP by the
mplstype te tunnel 1 command.
I’ll simulate a network event failure by shutting down
ethernet 2/26 (VLAN 4092) interface on
DM4004_196 and let’s see how many packets will be lost between these two PEs, from the perspective of the MPLS VPWS transported over these LSPs. I’ll run two tests, the first test will have
fast-reroute enabled in this topology, and the second one without
fast-reroute protection requested. To generate traffic, I’ll use Spirent Test Center, so I can have reasonable accuracy to measure the convergence time. The network traffic generated is being encapsulated in this VPWS.
On Test Center, the traffic stream rate is 1000 packets per second. So, each lost packet represents 1 ms of network outage.
Just to double check, let’s confirm that fast-reroute was successfully established from
DM4001_194's point of view:
On the PLR (point of local repair),
We’re good to go. Shutting down
ethernet 2/26 interface on
Detour tunnels were rerouted because of this event. As you can see in Figure 2, this event resulted in 11 packets lost from
DM4001_194 and 5 packets lost from
DM4001_207. On average, (5+11)/2, this represents an outage of 8 ms. Neat!
Now, I’ll disable
fast-reroute on those PEs in these LSPs:
Let’s verify the forwarding table on
DM4004_196, there are no detour tunnels, just PHP (penultimate hop popping) MPLS entries of both primary LSPs:
Now, let’s simulate the same network failure event again. As shown in Figure 3, this event resulted in 4418 lost packets from
DM4001_194 and 4417 lost packets from
DM4001_207. On average, this equates to 4.417 seconds of network outage.
In this particular case, with fast-reroute the convergence, from an end-to-end customer traffic perspective, resulted in 8 ms of network outage as opposed to 4.41 seconds when the LSPs weren’t protected. If you ever need to improve convergence time in your MPLS core, RSVP with fast-reroute could definitely be a great solution. Plus, with fast-reroute you also can take advantage of affinity (link-coloring) in order to have a more granular control about how detour LSPs are established. Maybe, I’ll address this point in a future post. Stay tuned!