Monday, January 25, 2010

Troubleshooting DMVPN

DMVPN is a great suite of protocols, from time to time something goes wrong though. Here's a few tips on how to troubleshoot it. It will not tell you how to troubleshoot every part but will rather guide you in narrowing the problem down.

In typical DMVPN scenario you will have following "layers" each dependent on all the ones before.
- Physical and IP
- Crypto
- GRE
- NHRP
- Routing protocol

1. Physical and IP - I'm putting those together since they are not really specific to DMVPN but you need to check if they work.
1.1 Check reachability, from spoke to hub by a simple ping or traceroute.
1.2 Typical problem: IPsec not starting to establish.
Do some basic testing - ping from spoke to hub, make sure not firewall on the way is blocking - UDP/500, UDP/4500 - if NAT-T is needed, ESP/AH.
If everything is configured but tunnel is not initiating... Did you configured NHRP network id?

A typical exercise here and at any level will be to verify CEF switching statistics "show cef drop" in old IOSes or "show cef switching statistic feature" on newer ones.

2. Crypto (IPsec), once you know that nothing is blocked and crypto show start establishing.
- Check that you have phase 1 SAs "show crypto isakmp sa det" the state you're looking for usually is QM_IDLE (or no IKE SA if lifetime is very short)
- Check IPsec SAs show crypto ipsec sa - both inbound and outbound SPI should be there and should be mirrored on other side of the tunnel. (Inbound SPI on spoke will be outbound SPI on hub and vice versa)
debug crypto ipsec and debug cry isakmp are your friends.

Two notable mentions here:
2.1 If in debugs you see tunnels establishing properly but they get torn down in few minutes it most likely means that NHRP relation is not establishing.
2.2 Crypto socket - there is a magic being called a crypto socket that is what is binding crypto and nhrp together - you can debug it - debug crypto socket. Problems with crypto socket can cause 2.1 but can be usually mitigated in short term by removing tunnel interface configuration and adding it back again. Many cases, different IOS versions affected, multiple bugs on Cisco side.

There is also a whole subset of problems with crypto accelerator cards that can show themselves here. Verify "show crypto engine accelerator statistic" and "show crypto engine configurtion" or "show crypto eli" - this will show you statistics and which accelerator is currently being used. You generally check for errors.

3. GRE - here's a fun fact, I've never seen a problem with GRE encapsulation or processing. But I would start by monitoring show interface tunnel X for input or output drops.
One problem you may encounter is .... NAT.
3.1 I've seen a scenario on a fairly recent 12.4T software where NAT was done for GRE traffic (no tunnel protection scenario). Check "sh ip nat trans".
3.2 If by any chance you're using "ip nat outside/inside" on tunnel interface, please check if you're not NATing too much.

4. NHRP - remember that even though the spoke has static NHRP mapping and "show ip nhrp brief" will always show you a mapping present (as opposed to the hub) it is the spoke that is initiating NHRP registration by sending registration request.
Useful debugs:
debug nhrp pack
debug nhrp ext 
debug nhrp err
debug nhrp rate
For each NHRP registration request you should see a packet encapsulated into IPsec (show crypto ipsec sa), if it's not the case enable debug from 2.2 and get in touch with Cisco TAC.

A hood value for NHRP holdtime would be around 300 seconds (as opposed to 7200 default).

5. Routing protocol - once you know all the "layers" below there is the RP level that makes it all tick. I've seen a range of problems here, some bugs, some platform specific (ASR hub taking longer to converge comparing to 7200 with same config). They will range from RP flapping (can be driven by NHRP or load) to downright instability of RP once spokes start connecting to a hub. It can be bug or platform limitation, one can write a book about this :-)

This post was meant to show you what are some common problems and how to track down the failing component. Hope it helps. If you're interested to learn more let me know.

Friday, January 22, 2010

DMVPN phase 3 - basic configuration example.

Phase 3 DMVPN is not a new topic. But Cisco documentation on this matter is bit lacking.

Two best articles are:
http://www.cisco.com/en/US/docs/ios/sec_secure_connectivity/configuration/guide/sec_DMVPN_ps6350_TSD_Products_Configuration_Guide_Chapter.html
combined with:
http://www.cisco.biz/en/US/prod/collateral/iosswrel/ps6537/ps6586/ps6660/ps6808/prod_white_paper0900aecd8055c34e_ps6658_Products_White_Paper.html

I've seen probably around 30 deployments with DMVPN many of them with configuration mistakes. It does not mean that it will not work - DMVPN is a robust beast - but wrong configuration is just asking for trouble later on.

So let's see it. (No crypto configuration, at this point)

First of all OSPF.
Please note that point-to-multipoint OSPF network type configuration has it's drawbacks (/32 for each tunnel address, re-computation on every flap), majority of people will want to use network type of broadcast.

Spoke config:
--------
interface Tunnel1
ip address 172.25.1.2 255.255.255.0
no ip redirects
ip nhrp map multicast 10.1.1.1
ip nhrp map 172.25.1.1 10.1.1.1
ip nhrp network-id 1
ip nhrp nhs 172.25.1.1
ip nhrp shortcut
ip nhrp redirect
ip ospf network point-to-multipoint

tunnel source Loopback0
tunnel mode gre multipoint

Both NHRP redirect and shortcut present.
OSPF network type set to point-to-multipoint.


Hub config:
-------
interface Tunnel1
ip address 172.25.1.4 255.255.255.0
no ip redirects
ip nhrp map multicast dynamic
ip nhrp network-id 1
ip nhrp redirect
ip ospf network point-to-multipoint

tunnel source Loopback0
tunnel mode gre multipoint

Only ip nhrp redirect is configured.

EIGRP spoke
--------
interface Tunnel1
bandwidth 64000
ip address 172.25.1.4 255.255.255.0
no ip redirects
ip nhrp map multicast 10.1.1.3
ip nhrp map 172.25.1.3 10.1.1.3
ip nhrp network-id 1
ip nhrp nhs 172.25.1.3
ip nhrp shortcut
ip nhrp redirect

tunnel source Loopback0
tunnel mode gre multipoint



EIGRP hub
---------
interface Tunnel1
bandwidth 64000
ip address 172.25.1.1 255.255.255.0
no ip redirects
ip nhrp map multicast dynamic
ip nhrp network-id 1
ip nhrp redirect
no ip split-horizon eigrp 1
ip summary-address eigrp 1 10.20.0.0 255.255.0.0 5

tunnel source Loopback0
tunnel mode gre multipoint

Note that in this particular case all the networks that this DMVPN clouds is "protecting" can be summarized into 10.20.0.0/16.

Please note increased bandwidth on tunnel interfaces for EIGRP. Default BW is 8kbit and will be used to calculate metrics for EIGRP.

So how would an intermediate/regional hub configuration look like?
interface Tunnel1
ip address 172.25.1.3 255.255.255.0
no ip redirects
ip nhrp map multicast dynamic
ip nhrp map 172.25.1.1 10.1.1.1
ip nhrp map multicast 10.1.1.1
ip nhrp network-id 1
ip nhrp nhs 172.25.1.1
ip nhrp redirect
no ip split-horizon eigrp 1
ip summary-address eigrp 1 10.20.10.0 255.255.255.0 5

tunnel source Loopback0
tunnel mode gre multipoint

Friday, January 08, 2010

FWSM - routing considerations or "Why clearing xlates solves it?"

edited on 18th May 2010.

Yet another things people tend not to realize (or maybe just those that do not attend networkers?).

All Cisco firewall appliances (ASA/PIX/FWSM) consider xlates before routing information. If some traffic has created an xlate all subsequent traffic will follow the path of xlate and not the route.

What I would like to point out is a problem that has been reported to me quite often.

There is a whole class of problems that manifests itself on the FWSM (mostly, but will also impact ASA/PIX) for which "clear xlate" is the only and temporary solution and here's what you need to know about it.

The problem shows in most cases where you have (pick as many as you like)
- no nat-control
- same security interfaces

There is one cure, which you should configure anyway on your appliance as a best practice.
Configure unicast RPF on ALL interfaces.
http://www.cisco.com/en/US/docs/security/fwsm/fwsm40/configuration/guide/protct_f.html#wp1042625

Why did I mention that the problem MANIFESTS itself on the FWSM?
Because the FWSM is working as expected - when a packet with same IP comes through an interface an xlate will be created. Once the xlate is created the traffic will consider existing xlates before routes.
So bottom line, if you have an improper xlate already installed in your xlate table, things may not work until you clear xlate(s), or it times out. However if you have continuous traffic keeping that xlate alive, the only way to clear this is to clear xlates. 

Unicast RPF will prohibit the bogus packet to create a xlate in the first place.

Why is this mostly seen on the FWSM? Because of the placement of FWSM - it's usually a datacenter with a mix of layer 2 and layer 3 traffic - probably proxy arp enabled on layer 3 interfaces, maybe some route leaking from a VRF that is normally protected by the FWSM and many many others.
Most of the time it's easier to fix the symptom then the root cause.

Enable unicast RPF on the FWSM in all your new deployment or consider contacting your local account team to get following bug integrated:
http://tools.cisco.com/Support/BugToolKit/search/getBugDetails.do?method=fetchBugDetails&bugId=CSCsi14227

If you want to get to the bottom of this problem enable information level messages and monitor what connection is causing this bogus xlate to be created. Usually you will see two sequential messages, creating connection, creating xlate. You will need to match the "outage" to creation of the xlate. From there you trace the packets to L2/L3 interfaces all over the network.

Wednesday, January 06, 2010

ASA/PIX PKI implementation. Mupliple trustpoints considerations.

Not sure if Cisco documents it anywhere, but here goes.

What happens if you have multiple trustpoints defined on the ASA.

When a certificate is presented to the ASA, the appliance can use ANY trust point configured on the device and will use first one matching provided client type is matching.

You cannot change this behavior, except for specifying different certificate usage:
http://www.cisco.com/en/US/docs/security/asa/asa80/command/reference/c4.html#wp2124040

You do however have control over which certificate is being SENT to the peers, this is what you configure under tunnel-groups and ssl CLIs.

Sunday, January 03, 2010

IPsec VPN on Catalyst 6500 or 7600.

First thing you need to know is that IPsec VPN on 6500 and 7600 will not work from SXE.

I've seen this too many times, problems with IPsec VPN on 6500 or 7600 with VPN SPA (Shared Port Adapter) or VPNSM (Service Module).

The place you need to start is:
http://www.cisco.com/en/US/docs/interfaces_modules/shared_port_adapters/configuration/6500series/76ovwvpn.html
Cross reference if the modules, configurations you're using are supported in the first place. They are not supported usually for very good reason - Cisco didn't deem it important enough.

If this is a new setup and you do not need any VRF features I would recommend going for CCA (Crypto Connect Alternative). In this mode the crypto engine operates in VRF mode but everything can remain in global VRF.
Here's a decent config example:
http://www.cisco.com/en/US/docs/interfaces_modules/shared_port_adapters/configuration/6500series/76cfvpna.html#wp2048824

Best software to run the VPN SPA with ... (3rd Jan 2010) SXI2a or SRC4, my personal types.
If you're considering VPN SPA as the platform for remote access, save yourself the trouble - use ASA instead (if you're into Cisco of course).

The problem with vlan 1
There is a known problem which is fixed by Cisco in a strange way. If you have vlan 1 configured on trunks to VPN SPA you might run into performance problems. So if you have problems with IPsec performance on 6500 and 7600 (only VRF and CCA mode) - remove vlan 1 from the trunks for VPN module (interface gigabitEthernet Module_slot/Subslot/0 and 1).
The fix implemented by Cisco:
For new installations do not add vlan 1 to the trunks for VPN SPA...
http://tools.cisco.com/Support/BugToolKit/search/getBugDetails.do?method=fetchBugDetails&bugId=CSCsl28371

edited: 17th Jun 2010

Saturday, January 02, 2010

IPsec and VRFs. So who's doing the VRF handoff anyway?

VRF aware IPsec - complicated topic? Not really.

So what are the the options that we need to configured
1) what will be the front (FVRF) and back (IVRF) VRF?
2) Which VRF will we be matching a given identity on?
3) Post encapsulation which VRF will we be putting the packets on?
You can read all this in the configuration guide:
http://www.cisco.com/en/US/docs/ios/sec_secure_connectivity/configuration/guide/sec_vrf_aware_ipsec_ps10591_TSD_Products_Configuration_Guide_Chapter.html
Check out the excellent examples.

So let's consider something more advanced. GRE over IPsec with VRFs.
Very often people will not know what should happen with IVRF, should it be specified under isakmp profile or not?
How does FVRF behave?
Following also applied to SVTI tunnels, and to some extent DVTI, note that VRF integration with L2TP(/with IPsec) is a whole new world.

Consider this example.
I have VRF BB linking both sides of GRE tunnel. One side has VRF BLUE and the other RED.
I will configure one side using tunnel protection (TP) other using crypto maps.
Certificate authentication is easier (no keyring needed) but let have a look.
Note that I do not need to go to my CA server via any VRF so I do not specify VRF in trustpoints.

TP side:
------------------------------
crypto pki trustpoint cisco2
 enrollment url http://10.0.0.3:80
 serial-number
 revocation-check crl
crypto isakmp policy 10
 encr aes
 group 2
crypto ipsec transform-set ITS esp-3des esp-sha-hmac
crypto ipsec profile PRO
 set transform-set ITS
interface Tunnel0
 ip vrf forwarding BLUE
 ip address 172.16.0.1 255.255.255.252
 tunnel source Ethernet0/0.100
 tunnel destination 10.0.0.101
 tunnel vrf BB
 tunnel protection ipsec profile PRO
------------------------------
And yes, it's GRE/VTI that is doing the handoff. Not convinced? Let's have a look at crypto maps.

------------------------------
crypto isakmp policy 10
 encr aes
 group 2
crypto isakmp identity dn
crypto isakmp profile IPSEC_VRF
   ca trust-point cisco
   match identity address 10.0.0.100 255.255.255.255 BB
 crypto map MAP 100 ipsec-isakmp
 set peer 10.0.0.100
 set transform-set ITS
 match address GRE
interface Tunnel0
 ip vrf forwarding RED
 ip address 172.16.0.2 255.255.255.252
 tunnel source Ethernet0/0.100
 tunnel destination 10.0.0.100
 tunnel vrf BB
interface Ethernet0/0.100
 encapsulation dot1Q 100
 ip vrf forwarding BB
 ip address 10.0.0.101 255.255.255.0
 crypto map MAP
------------------------------

Is it working? And what are the main differences... Let's have a look at show crypto ipsec sa output.

TP side:
------------------------------
interface: Tunnel0
    Crypto map tag: Tunnel0-head-0, local addr 10.0.0.100
   protected vrf: BLUE
------------------------------

Crypto map side:
------------------------------
interface: Ethernet0/0.100
    Crypto map tag: MAP, local addr 10.0.0.101
   protected vrf: BB
------------------------------

Is that at all correct? Well check "show crypto map". All I can say "It's pinging!"

Friday, January 01, 2010

EZVPN with certificates. part3

In my previous post I used certificate maps rather then using match identity groups. Why?

I could have easily used "match identity group LAB" as you can see in debugs from previous post, but I prefer not to. Certificate maps are here to stay they offer much more flexibility then static matching.

Here's some background why you might want to consider using cert maps in production.

As of IOS 12.4(20)T (including and everything above) some connection - like L2L tunnels from example stopped sending Unity VID. So what? So, match identity group will not work in this case, it's only being used in case of ezvpn - where unity tag is set. Where is this documented? It's not, but feel free to check your debugs :)

A thought on virtual templates if you're not using them, you should start moving your setup to this. Cisco will be making this THE ezvpn setup (for both client and server). It offer hugle flexibility improvements and fixes some of the shortcomings of legacy configurations (NAT, firewall, access-lists), plus all other remote access methods (L2TP, PPTP and yes recently even webvpn) are using it already 
Virtual-template ezvpn setup is referenced as DVTI, while tunnel interfaces with "tunnel mode ipsec ipv4" is called SVTI - just in case you have to work with Cisco TAC.

EZVPN with certificates. part2

So I cheated a bit.

Previous post, that would not really work.... here's why.  Check out MM3/4 on server side and MM5 on client where it would actually fail.
-----------------------------------
*Dec 28 20:48:41.199: ISAKMP:(1006): processing CERT_REQ payload. message ID = 0
*Dec 28 20:48:41.199: ISAKMP:(1006): peer wants a CT_X509_SIGNATURE cert
*Dec 28 20:48:41.199: ISAKMP:(1006): peer wants cert issued by cn=SUBCA2.cisco.com,ou=LAB1
(....)
*Dec 28 20:48:41.199: ISAKMP:(1006):Old State = IKE_R_MM3  New State = IKE_R_MM3
*Dec 28 20:48:41.215: ISAKMP (1006): constructing CERT_REQ for issuer cn=SUBCA1.cisco.com,ou=LAB
-----------------------------------

Which results in client failing in MM5 because we cannot find a common CA.

Easy fix - enrolling to same CA.

So the end config
-----------------------------------
aaa new-model
aaa authentication login EZ local
aaa authorization network EZ local
crypto pki certificate map MAP1 10
 subject-name co lab
crypto isakmp policy 10
 encr aes
 group 2
crypto isakmp client configuration group EZ_GROUP
 domain cisco.com
 pool EZ
 save-password
 include-local-lan
 pfs
crypto isakmp profile ISAKMP_PROFILE
   match certificate MAP1
   client authentication list EZ
   isakmp authorization list EZ
   client configuration address respond
   client configuration group EZ_GROUP
   virtual-template 100
crypto ipsec transform-set ITS esp-3des esp-sha-hmac
crypto ipsec profile PRO
 set transform-set ITS

interface Virtual-Template100 type tunnel
 ip unnumbered Loopback0
 tunnel mode ipsec ipv4
 tunnel protection ipsec profile PRO
-----------------------------------


-----------------------------------
crypto ipsec client ezvpn EZ_CLIENT
 connect manual
 mode client
 peer 192.168.0.1
 virtual-interface 100
 username cisco password cisco
 xauth userid mode local
interface Virtual-Template100 type tunnel
 no ip address
 tunnel mode ipsec ipv4
end
-----------------------------------


Server side debugs:
------------------------------------
*Jan  2 09:53:46.527: ISAKMP:(1003):Old State = IKE_R_MM3  New State = IKE_R_MM3
*Jan  2 09:53:46.547: ISAKMP (0:1003): constructing CERT_REQ for issuer cn=SUBCA2.cisco.com,ou=LAB1
*Jan  2 09:53:46.547: ISAKMP (0:1003): constructing CERT_REQ for issuer cn=SUBCA1.cisco.com,ou=LAB
(....)
*Jan  2 09:53:46.615: ISAKMP:(0):: UNITY's identity group: OU = LAB1
*Jan  2 09:53:46.615: ISAKMP:(0):: peer matches *none* of the profiles
*Jan  2 09:53:46.615: ISAKMP:(1003): processing CERT payload. message ID = 0
*Jan  2 09:53:46.615: ISAKMP:(1003): processing a CT_X509_SIGNATURE cert
*Jan  2 09:53:46.615: ISAKMP:(1003): peer's pubkey is cached
*Jan  2 09:53:46.615: ISAKMP:(1003): OU = LAB1
*Jan  2 09:53:46.615: ISAKMP:(0): certificate map matches ISAKMP_PROFILE profile
*Jan  2 09:53:46.615: ISAKMP:(0): Trying to re-validate CERT using new profile
*Jan  2 09:53:46.615: ISAKMP:(0): CERT validity confirmed.
--------------------------------