OTV on ASR1K done right

Since I’m a big fan of OTV (Overlay Transport Virtualization) – really, really good technology – I play with it from time to time, both, labbing and configuring production environments. There are many guides on cisco.com and other blogs on how to configure OTV on ASR1000, but they all seem to be very basic. That basic configuration is not enough to run OTV, so, I would like to share my findings, which make OTV to work faster and to be more reliable (thank you goes also to Cisco TAC for pointing out few things). I want to focus on a special-case scenario, which is quite popular (at least what I observe on our local market), but makes OTV a little complicated to configure. It’s about running OTV in a “square” topology, where there are four ASRs connected to each other back-to-back, and unicast adjacency servers are used, like depicted on the below diagram.

OTV diagram

It’s called the special-case scenario, because there is no “routing” cloud between the two sites (like L3 WAN segment or ISP’s MPLS area). In such case, OTV requires additional configuration, to make Overlay interfaces react properly to link failures. It is described here: OTV Special Case. (I will not go deeper into this, as my post would simply become redundant, I recommend reading that document first, if you are deploying such environment). Solution provided there is rather a guideline. What else do we need to add?

First of all, we should run BFD on all links (yes, even on local L2), so Overlay interfaces can properly react to WAN link failures. And yes, WAN link failures can be of many types, including some soft ones (even on xWDM devices), where end devices cannot see a physical “link down”. That’s where BFD comes handy. Enabling BFD seems to be an easy task, but it’s not (at least it’s not intuitive). Let’s assume, our initial OTV configuration for ASR3 looks like this (I will not cover all devices, for brevity, they look similiar, I just want to point you to an idea of how to configure it):

! Enable RSTP for fast convergence
spanning-tree mode rapid-pvst
! Make sure an ASR is NOT a root
spanning-tree vlan 666 priority 61440
spanning-tree extend system-id

! Define a Site VLAN
otv site bridge-domain 102

! Both AEDs in a site must have the same ID
otv site-identifier 0000.0000.0001

! Yes, configure static NET, to make sure VLANs are properly served by AEDs.
! IDs for upper AEDs should be lower or higer than the other two devices,
! so odd VLANs are served by the same pair of AEDs in separate sites. Make sure
! an area (49.0000) is the same for all AEDs
otv isis Site
 net 49.0000.0000.0000.0001.00

interface Overlay0
 no ip address
 otv join-interface GigabitEthernet0/1
 otv use-adjacency-server unicast-only
 service instance 666 ethernet
  encapsulation dot1q 666
  bridge-domain 666

interface GigabitEthernet0/1
 description To ASR1
 mtu 9000
 ip address
 ! Simple BFD for EIGRP (not enough for OTV, more info later)
 bfd interval 500 min_rx 500 multiplier 5

interface GigabitEthernet0/2
 description To ASR4
 mtu 9000
 ip address
 ! Simple BFD for EIGRP
 bfd interval 500 min_rx 500 multiplier 5

interface GigabitEthernet0/3
 description Trunk to local LAN
 mtu 9000
 no ip address
 ! Site VLAN 102, only between local edge devices
 service instance 102 ethernet
  encapsulation dot1q 102
  bridge-domain 102
 ! VLAN 666 transported over BFD
 service instance 666 ethernet
  encapsulation dot1q 666
  bridge-domain 666

! EIGRP runs on all p2p links to announce Join interfaces (must be reachable)
router eigrp 10
 no aut-summary
 ! other config cut for brevity

Now, let’s enable BFD on VLAN 102, between local edge devices:

otv site bridge-domain 102
 ! Enable BFD for the Site VLAN
 otv isis bfd

! Create virtual interface (aka SVI) - BFD runs on L3 only
interface bdi 102
 encapsulation dot1q 102
 ! You can use any IPs, not necessarily used by devices in that particular VLAN
 ip address
 bfd interval 500 min_rx 500 multiplier 5

OTV adjacency in dual-homed scenario runs on the Site VLAN, but also on Join interfaces  through the “routed domain”. So, we have to enable BFD on our Overlay. To do that, we have to configure multihop BFD (Overlay interfaces uses the Join interface, which is NOT directly connected to the other AED in the site):

! Add multihop BFD (between Join interfaces), use /24, as we will need that
! also for p2p links between sites, more info later
bfd map ipv4 MultiHopBFD
bfd-template multi-hop MultiHopBFD
 interval min-tx 500 min-rx 600 multiplier 5
 ! Dampening not required but highly recommended - more info later
 dampening 10 200 1000 30

interface Overlay0
 ! Enbale BFD on the Overlay interface, use multihop template's timers
 otv isis bfd

We have now BFD running between local AEDs. It’s time to add BFD between the two sites. You would probably say “but we have this already configured on Gi0/1”. Of course. But, as you know, BFD informs all subscribers about the status of a session, so for example, EIGRP can quickly tear down its own session without waiting for a Hello timeout. However, we need that BFD for OTV to shutdown the local Overlay interface when local WAN link goes down. To do that, we have to use a combination of BFD and EEM. Unfortunately, current IOS XE versions do not log which interface is down, they only log that the BFD session is down (the magic is done through subscribtion process). Maybe in newer versions it will change, but now, we have to add a static route:

ip route static bfd log unassociate

It is a special route, attached to multihop BFD session (remeber /24 mask? assuming all p2p links are in the same /24, otherwise, you have to create separate multihop session). When BFD on the WAN link goes down, you can see a syslog entry:

%IPRT-5-STATICROUTES_BFD_EVENT: BFD session Down,[Destination Addr:,
Source Addr:]

So, since we now have an identification of particular session, we can add the final part, our EEM script, which will shut down the Overlay interface when BFD session on the WAN link goes down. This step is REQUIRED, as the Overlay interface is strictly bound to the Join interface, and if no traffic can pass WAN link (especially during soft failures), OTV becomes unstable and unpredictible. Remember to confgure such scripts on all AEDs, the Overlay interface must be shut down on both ends.

! Script to shutdown the Overlay interface
event manager applet WatchBFDdown authorization bypass
 event syslog pattern "BFD session Down,\[Destination Addr:" period 1
 action 2.0 cli command "enable"
 action 3.0 cli command "config t"
 action 4.0 syslog msg "EEM: WatchBFDdown will shut int Overlay0"
 action 5.0 cli command "interface Overlay0"
 action 6.0 cli command "shutdown"
 action 7.0 cli command "description * Disabled by EEM script WatchBFDdown *"
 action 8.0 cli command "exit"
 action 9.0 syslog msg "EEM: WatchBFDdown COMPLETE"

! Script to enable the Overlay interface
event manager applet WatchBFDup authorization bypass
 event syslog pattern "BFD session Up,\[Destination Addr:" period 1
 action 2.0 cli command "enable"
 action 3.0 cli command "config t"
 action 4.0 syslog msg "EEM: WatchBFDup bringing up int Overlay0"
 action 5.0 cli command "interface Overlay0"
 action 6.0 cli command "no shutdown"
 action 7.0 cli command "no description"
 action 8.0 cli command "exit"
 action 9.0 syslog msg "EEM: WatchBFDup COMPLETE"

Now, why dampening for BFD multihop session? I noticed, that when BFD starts, after a WAN link failure, some packets may still be dropped, and I saw flapping of BFD echos. In such case, EEM scripts start running in paralel, and the effect is unpredictible. Those dampening timers worked for me, but you may tune them in your environment.

One last thing. Tuning internal ISIS timer makes no sense, when you use BFD, so you can leave them (it is also not recommended by Cisco to change those timers, unless you know what you are doing).

That would be all. I hope it becomes useful for someone currently fighting with similiar deployment. Have fun, and cheers.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s