Since I’m a big fan of OTV (Overlay Transport Virtualization) – really, really good technology – I play with it from time to time, both, labbing and configuring production environments. There are many guides on cisco.com and other blogs on how to configure OTV on ASR1000, but they all seem to be very basic. That basic configuration is not enough to run OTV, so, I would like to share my findings, which make OTV to work faster and to be more reliable (thank you goes also to Cisco TAC for pointing out few things). I want to focus on a special-case scenario, which is quite popular (at least what I observe on our local market), but makes OTV a little complicated to configure. It’s about running OTV in a “square” topology, where there are four ASRs connected to each other back-to-back, and unicast adjacency servers are used, like depicted on the below diagram.
It’s called the special-case scenario, because there is no “routing” cloud between the two sites (like L3 WAN segment or ISP’s MPLS area). In such case, OTV requires additional configuration, to make Overlay interfaces react properly to link failures. It is described here: OTV Special Case. (I will not go deeper into this, as my post would simply become redundant, I recommend reading that document first, if you are deploying such environment). Solution provided there is rather a guideline. What else do we need to add?
First of all, we should run BFD on all links (yes, even on local L2), so Overlay interfaces can properly react to WAN link failures. And yes, WAN link failures can be of many types, including some soft ones (even on xWDM devices), where end devices cannot see a physical “link down”. That’s where BFD comes handy. Enabling BFD seems to be an easy task, but it’s not (at least it’s not intuitive). Let’s assume, our initial OTV configuration for ASR3 looks like this (I will not cover all devices, for brevity, they look similiar, I just want to point you to an idea of how to configure it):
! Enable RSTP for fast convergence spanning-tree mode rapid-pvst ! Make sure an ASR is NOT a root spanning-tree vlan 666 priority 61440 spanning-tree extend system-id ! Define a Site VLAN otv site bridge-domain 102 ! Both AEDs in a site must have the same ID otv site-identifier 0000.0000.0001 ! Yes, configure static NET, to make sure VLANs are properly served by AEDs. ! IDs for upper AEDs should be lower or higer than the other two devices, ! so odd VLANs are served by the same pair of AEDs in separate sites. Make sure ! an area (49.0000) is the same for all AEDs otv isis Site net 49.0000.0000.0000.0001.00 interface Overlay0 no ip address otv join-interface GigabitEthernet0/1 otv use-adjacency-server 172.16.16.1 172.16.16.6 unicast-only service instance 666 ethernet encapsulation dot1q 666 bridge-domain 666 interface GigabitEthernet0/1 description To ASR1 mtu 9000 ip address 172.16.16.2 255.255.255.252 ! Simple BFD for EIGRP (not enough for OTV, more info later) bfd interval 500 min_rx 500 multiplier 5 interface GigabitEthernet0/2 description To ASR4 mtu 9000 ip address 172.16.16.13 255.255.255.252 ! Simple BFD for EIGRP bfd interval 500 min_rx 500 multiplier 5 interface GigabitEthernet0/3 description Trunk to local LAN mtu 9000 no ip address ! Site VLAN 102, only between local edge devices service instance 102 ethernet encapsulation dot1q 102 bridge-domain 102 ! VLAN 666 transported over BFD service instance 666 ethernet encapsulation dot1q 666 bridge-domain 666 ! EIGRP runs on all p2p links to announce Join interfaces (must be reachable) router eigrp 10 no aut-summary bfd network 172.16.16.0 0.0.0.255 ! other config cut for brevity
Now, let’s enable BFD on VLAN 102, between local edge devices:
otv site bridge-domain 102 ! Enable BFD for the Site VLAN otv isis bfd ! Create virtual interface (aka SVI) - BFD runs on L3 only interface bdi 102 encapsulation dot1q 102 ! You can use any IPs, not necessarily used by devices in that particular VLAN ip address 220.127.116.11 255.255.255.0 bfd interval 500 min_rx 500 multiplier 5
OTV adjacency in dual-homed scenario runs on the Site VLAN, but also on Join interfaces through the “routed domain”. So, we have to enable BFD on our Overlay. To do that, we have to configure multihop BFD (Overlay interfaces uses the Join interface, which is NOT directly connected to the other AED in the site):
! Add multihop BFD (between Join interfaces), use /24, as we will need that ! also for p2p links between sites, more info later bfd map ipv4 172.16.16.0/24 172.16.16.0/24 MultiHopBFD bfd-template multi-hop MultiHopBFD interval min-tx 500 min-rx 600 multiplier 5 ! Dampening not required but highly recommended - more info later dampening 10 200 1000 30 interface Overlay0 ! Enbale BFD on the Overlay interface, use multihop template's timers otv isis bfd
We have now BFD running between local AEDs. It’s time to add BFD between the two sites. You would probably say “but we have this already configured on Gi0/1”. Of course. But, as you know, BFD informs all subscribers about the status of a session, so for example, EIGRP can quickly tear down its own session without waiting for a Hello timeout. However, we need that BFD for OTV to shutdown the local Overlay interface when local WAN link goes down. To do that, we have to use a combination of BFD and EEM. Unfortunately, current IOS XE versions do not log which interface is down, they only log that the BFD session is down (the magic is done through subscribtion process). Maybe in newer versions it will change, but now, we have to add a static route:
ip route static bfd 172.16.16.2 172.16.16.1 log unassociate
It is a special route, attached to multihop BFD session (remeber /24 mask? assuming all p2p links are in the same /24, otherwise, you have to create separate multihop session). When BFD on the WAN link goes down, you can see a syslog entry:
%IPRT-5-STATICROUTES_BFD_EVENT: BFD session Down,[Destination Addr:172.16.16.1, Source Addr:172.16.16.2]
So, since we now have an identification of particular session, we can add the final part, our EEM script, which will shut down the Overlay interface when BFD session on the WAN link goes down. This step is REQUIRED, as the Overlay interface is strictly bound to the Join interface, and if no traffic can pass WAN link (especially during soft failures), OTV becomes unstable and unpredictible. Remember to confgure such scripts on all AEDs, the Overlay interface must be shut down on both ends.
! Script to shutdown the Overlay interface event manager applet WatchBFDdown authorization bypass event syslog pattern "BFD session Down,\[Destination Addr:172.16.16.1" period 1 action 2.0 cli command "enable" action 3.0 cli command "config t" action 4.0 syslog msg "EEM: WatchBFDdown will shut int Overlay0" action 5.0 cli command "interface Overlay0" action 6.0 cli command "shutdown" action 7.0 cli command "description * Disabled by EEM script WatchBFDdown *" action 8.0 cli command "exit" action 9.0 syslog msg "EEM: WatchBFDdown COMPLETE" ! Script to enable the Overlay interface event manager applet WatchBFDup authorization bypass event syslog pattern "BFD session Up,\[Destination Addr:172.16.16.1" period 1 action 2.0 cli command "enable" action 3.0 cli command "config t" action 4.0 syslog msg "EEM: WatchBFDup bringing up int Overlay0" action 5.0 cli command "interface Overlay0" action 6.0 cli command "no shutdown" action 7.0 cli command "no description" action 8.0 cli command "exit" action 9.0 syslog msg "EEM: WatchBFDup COMPLETE"
Now, why dampening for BFD multihop session? I noticed, that when BFD starts, after a WAN link failure, some packets may still be dropped, and I saw flapping of BFD echos. In such case, EEM scripts start running in paralel, and the effect is unpredictible. Those dampening timers worked for me, but you may tune them in your environment.
One last thing. Tuning internal ISIS timer makes no sense, when you use BFD, so you can leave them (it is also not recommended by Cisco to change those timers, unless you know what you are doing).
That would be all. I hope it becomes useful for someone currently fighting with similiar deployment. Have fun, and cheers.