Pseudowire redundancy is key when it comes to eliminating a single point of failure, such as PW, AC and PE, in certain network designs as far as MPLS L2VPN services (e.g., VPWS and VPLS). When configuring pseudowire redundancy on Junos, you can either leave the backup PW in
standby or in
hot-standby mode. In this article, I’m going to compare both modes and point out some major benefits and drawbacks that you might want to take into account if you ever need to use PW redundancy in your network.
According to RFC 6870 1, new status bit values were defined in order to allow more granular control, specifically to coordinate a switchover to backup PWs. In this RFC, these new PW status TLVs were introduced 0x00000020 (
PW forwarding standby) and 0x00000040 (
Request switchover to this PW). On Junos 15.1, the following PW status codes are available:
As you can see, on Junos 15.1 it’s possible to make use of the
PW_STATUS_PW_FWD_STDBY in order to signal a standby PW status. However, by default, Junos doesn’t signal the status TLV and if you’re going to use it you have to enable the
pseudowire-status-tlv command on both PEs. In addition to this, Juniper implements two standby modes,
hot-standby which I’ll explain briefly in the next section.
There are two modes you can configure on Junos 15.1, either you set
hot-standby mode. The former is the default one when you configure a secondary PW by setting the
backup-neighbor command and the later, as the name implies, allows you to keep the backup PW in a ready to forwarding state in order to achieve a faster switchover time when the active PW fails. Therefore, whenever
hot-standby is in place, you’ll see that both PEs will have their L2CKT label ready to go in both software and hardware tables. As a trade-off though, this mode has the drawback of replicating BUM traffic. On the other hand, the
standby mode has to exchange L2CKT labels as soon as the switchover operation takes place. To show these differences, I’ll use the topology illustrated in Figure 1.
In this topology, there is a CE2 device, which is connected to the AC on VLAN 1024, multi-homed to PE2 and PE4 to increase high availability. PE1 has two PWs, connecting the CE1 to the AC on VLAN 1024, an active one
PW_VCID_12 between PE1 and PE2 and a backup one
PW_VCID_14 between PE1 and PE4. Consequently, whenever
PW_VCID_14 is supposed to take over. Now, we’ll analyze the switchover time and see the difference between the two standby modes in practice.
For readability’s sake, the configuration presented in the following subsections are only related to L2VPN Martini and pseudowire redundancy.
When configuring the hot-standby mode, you have to set the
pseudowire-status-tlv hot-standby-vc-on which enables the remote standby PE (PE4, in this case) to process the
hot-standby TLV (i.e. code 0x00000020) properly when received from PE1. In addition to this, you also have to set the
hot-standby on PE1, which is the PE with redundant PWs. Check out the snippet code bellow.
First, let’s see from PE1’s perspective the signaling state of both active and standby PW. As you can see in the output of the terminal bellow, when the default
standby mode is configured, the backup-neighbor (PE4) PW connection status is
BK which stands for
Backup Connection. Also, note that the local PE1 is signaling the TLV as 0x00000001 (
not forwarding). As a consequence, PE1 doesn’t have an outgoing label in software (LDP database) for the L2CKT of this
PW_VCID_14. Lastly, the only outgoing L2CKT entry in hardware is the one which leads to the active PW (PE2).
From PE4’s perspective, the PW connection status is
OL which stands for
No outgoing label. Plus, as expected, there isn’t any L2CKT entry installed in hardware.
I’ll keep an ICMP traffic flowing from CE1 to CE2 and then I’ll request a reboot on PE2 to simulate as if PE2 failed. This reboot will trigger the PW switchover to PE4. Let’s see how many ICMP packets will be lost over this process.
Out of 4000 packets, 16 packets were lost.
The active PW is still between PE1 and PE2 (by now PE2 has already recovered completely) and I’ll do a broadcast ping from CE2 to CE1. We can verify bellow that on all MPLS uplinks there was only one packet sent as expected. In other words, even though both PE2 and PE4 received this broadcast ping on VLAN 1024, PE2 was the only one who encapsulated this traffic over the MPLS backbone as depicted in Figure 2. Cool!
Now, let’s compare with the
hot-standby mode. Once again, we’ll start the verification on PE1. The major difference is the fact that the PW connection status is
HS now which means
Hot-standby Connection and the local PW status code is 0x00000020. Also, both incoming and outgoing label for the backup PW (
PW_VCID_14) is installed both in software and hardware.
From PE4’s point of view, the greatest difference when compared to the previous output is that now the status is
Up and the remote TLV PW status code is 0x00000020. Plus, both incoming and outgoing label are installed in software and hardware to speed up the switchover operation. Although all labels are installed, this PE won’t encapsulate traffic over the MPLS backbone except for BUM traffic as you’ll see shortly.
I’ll run the same ping test again. Let’s compare how many packets will be lost.
How cool is that, huh? Now only 2 packets were lost as opposed to 16 packets when I ran the same test with the previous configuration. In fact, it was 8 times faster. So, if you’re looking for optimizing the switchover time, the
hot-standby mode is definitely a great solution.
I’ll run the same test again to verify whether PE4 (which is now in the
hot-standby state) is going to encapsulate this broadcast packet over the MPLS core. By now, PE2 has completely recovered from the reboot. As a result, the PW between PE1 and PE2 is the active one.
As illustrated in Figure 3, the broadcast packet was duplicated in the MPLS backbone. One packet came from PE2 and the other one from PE4, which is the evidence that BUM traffic will be encapsulated even if the PW is in the hot-standby state. You can check this by looking at the label values in the MPLS header of this capture. For example, in Figure 3, the VC label is 299872 (which is the outgoing VC-label from PE2’s perspective, as you can see in the PE2 output bellow). Similarly, in Figure 4, the label stacking 299808/299888 came from PE4, you can check these labels in the previous sections of this article.
To sum up, redundant pseudowire is key to increase high availability and eliminate single point of failures in your network. Plus, the
hot-standby mode on Junos is an excellent option if your goal is to minimize the time it takes to switchover from the primary to the backup PW. Nevertheless, this optimization comes at the expense of replicating BUM traffic. Therefore, weigh your options and if you decide to stick with this mode, make sure you can keep this type of traffic under control to avoid wasting bandwidth in your MPLS backbone.