Troubleshooting Connectivity to the Neutron Metadata Proxy

 

Introduction

After troubleshooting a recent issue with accessing meta services from a guest instance, and jumping through the various steps within the path, I soon realised – this would make for a great article.

Issue

So first of all let’s look at the issue. The issue was pretty basic. Quite simply I was unable to connect to the metadata service from a nova instance. As shown below:

$ curl -v  
* couldn't connect to host
curl: (7) couldn't connect to host

Troubleshooting

In order to troubleshoot we will step through each part of the path – from guest (upon the compute node) to the Neutron metadata proxy (within the infra nodes and relating LXC) .

Please note that the OpenStack environment within this article is based upon a Linux Bridge OSA (OpenStack Ansible) based deployment.

Guest – Injected Route

As our guest is on an isolated network, i.e a network that is not behind an L3 router. In addition our DHCP agent is configured with enable_isolated_metadata = True[1],  meaning our DHCP server will append specific host routes to the DHCP request (via option 121). This results in an additional route being added to the guest, in turn ensuring requests to 169.254.169.254 are routed via the DHCP agent IP.

$ netstat -nr
Kernel IP routing table
Destination     Gateway         Genmask         Flags   MSS Window  irtt Iface
0.0.0.0         192.168.1.1     0.0.0.0         UG        0 0          0 eth0
169.254.169.254 192.168.1.2     255.255.255.255 UGH       0 0          0 eth0
192.168.1.0     0.0.0.0         255.255.255.0   U         0 0          0 eth0

As you can see the correct route is in place. Lets ping the address to confirm connectivity:

$ ping 192.168.1.2 -c 2
PING 192.168.1.2 (192.168.1.2): 56 data bytes
64 bytes from 192.168.1.2: seq=0 ttl=64 time=0.768 ms
64 bytes from 192.168.1.2: seq=1 ttl=64 time=0.545 ms
This all looks good, lets move on.

This all looks good, lets move on.

Compute – Bridge Packet Trace

Next we will confirm that traffic is making it out from the instance to the bridge. There are a few steps in this process. They are:

Confirm Instance Name

In order to obtain the instance name, we perform a show on the server id. Like so,

root@infra1:~# openstack server list
+--------------------------------------+-------------------------+--------+------------------------------------------------------------+------------------------+
| ID                                   | Name                    | Status | Networks                                                   | Image Name             |
+--------------------------------------+-------------------------+--------+------------------------------------------------------------+------------------------+                                                   
| d0cebae2-ba1a-45fc-8f11-d28e8fd93ee3 | cirros-DEMO             | ACTIVE | tenant_network_lab001-vsrx=192.168.1.50                    | cirros                 |
+--------------------------------------+-------------------------+--------+------------------------------------------------------------+------------------------+

root@infra1:~# openstack server show d0cebae2-ba1a-45fc-8f11-d28e8fd93ee3
+-------------------------------------+----------------------------------------------------------+
| Field                               | Value                                                    |
+-------------------------------------+----------------------------------------------------------+
| OS-DCF:diskConfig                   | MANUAL                                                   |
| OS-EXT-AZ:availability_zone         | nova                                                     |
| OS-EXT-SRV-ATTR:host                | compute03                                                |
| OS-EXT-SRV-ATTR:hypervisor_hostname | compute03                                                |
| OS-EXT-SRV-ATTR:instance_name       | instance-0000001a                                        |
| OS-EXT-STS:power_state              | Running                                                  |
| OS-EXT-STS:task_state               | None                                                     |
| OS-EXT-STS:vm_state                 | active                                                   |
| OS-SRV-USG:launched_at              | 2017-09-03T06:19:23.000000                               |
| OS-SRV-USG:terminated_at            | None                                                     |
| accessIPv4                          |                                                          |
| accessIPv6                          |                                                          |
| addresses                           | tenant_network_lab001-vsrx=192.168.1.50                  |
| config_drive                        |                                                          |
| created                             | 2017-09-03T06:19:18Z                                     |
| flavor                              | cirros-small (58772350-942d-442b-9e39-ef953787a2d4)      |
| hostId                              | 57f31ea4cecf1bdae04624374a6db8df774b57e94d8427a8c5f1cff1 |
| id                                  | d0cebae2-ba1a-45fc-8f11-d28e8fd93ee3                     |
| image                               | cirros (5ffaa767-fc80-4d92-90f7-29b6d949797f)            |
| key_name                            | None                                                     |
| name                                | cirros-DEMO                                              |
| progress                            | 0                                                        |
| project_id                          | f1eb80264e9c4c688c7603bbb5541396                         |
| properties                          |                                                          |
| status                              | ACTIVE                                                   |
| updated                             | 2017-09-03T06:19:23Z                                     |
| user_id                             | 85aea2625c7349fc8796527085a04226                         |
| volumes_attached                    |                                                          |
+-------------------------------------+----------------------------------------------------------+

 From the output can you see the field OS-EXT-SRV-ATTR:instance_name and its value – instance-0000001a.

Confirm Bridge Name

Using the instance_name we can then perform a lookup (on the corresponding host) for the bridge. As shown below,

root@compute03:~# virsh domiflist instance-0000001a
Interface      Type   Source         Model   MAC
--------------------------------------------------------------
tap0e6c063c-48 bridge brq423b2b1b-55 virtio  fa:16:3e:92:bd:fd

TCPDump

Now we have the bridge name, whilst running another curl from the guest we can perform a tcpdump. From this we can see the SYN is sent, but a RST is returned. So we know the traffic from the guest is making it to the attached bridge, however something is preventing the completion of the TCP connection by sending a RST. 

root@compute03:~# tcpdump -ni brq423b2b1b-55
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on brq423b2b1b-55, link-type EN10MB (Ethernet), capture size 262144 bytes
22:02:27.792813 IP 192.168.1.50.49513 > 169.254.169.254.80: Flags [S], seq 2770101210, win 14100, options [mss 1410,sackOK,TS val 77970740 ecr 0,nop,wscale 2], length 0
22:02:27.793369 IP 169.254.169.254.80 > 192.168.1.50.49513: Flags [R.], seq 0, ack 2770101211, win 0, length 0

Controller – Neutron DHCP Namespace

We will now jump into the DHCP namespace. This namespace also hosts the metadata service proxy.  Once inside we can confirm if the necessary IPs are configured, and also run another tcpdump to see if the traffic is making it in.

First things first, upon the infra node we connect to the neutron_agents_container. We confirm its full name by running lxc-ls -f.

lxc-attach -n infra1_neutron_agents_container-de4c0565

Once attached, we list the Linux namespaces using the following command:

root@infra1-neutron-agents-container-de4c0565:~# ip netns list
qdhcp-d5b08c40-3b9e-4d6e-872b-7e8ce1553703 (id: 5)
qrouter-215981d1-32bc-41fd-a13a-27dbe7ecb63f (id: 4)
qdhcp-579d7378-84ad-45d5-882d-e052bab9595e (id: 3)
qdhcp-33ad9c28-5884-41e1-97c0-05886740b5c1 (id: 1)
qdhcp-423b2b1b-5591-4861-baab-64e9fef84f47 (id: 2)

Based upon our previous bridge name brq423b2b1b-55 we can correlate this to the name space –  qdhcp-423b2b1b-5591-4861-baab-64e9fef84f47.

With this namespace identifer we can run commands against the namespace, in order to help us with our troubleshooting. First, let us look at the IP address, this should be 192.168.1.2.

root@infra1-neutron-agents-container-de4c0565:~# ip netns exec qdhcp-423b2b1b-5591-4861-baab-64e9fef84f47 ifconfig
lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

ns-e2e1c16d-6e Link encap:Ethernet  HWaddr fa:16:3e:82:97:6e
          inet addr:192.168.1.2  Bcast:192.168.1.255  Mask:255.255.255.0
          inet6 addr: fe80::f816:3eff:fe82:976e/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1450  Metric:1
          RX packets:160 errors:0 dropped:0 overruns:0 frame:0
          TX packets:97 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:12228 (12.2 KB)  TX bytes:7988 (7.9 KB)

Well, the IP looks correct. Lets run another tcpdump, whilst running a curl on our guest.

tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on any, link-type LINUX_SLL (Linux cooked), capture size 262144 bytes
22:14:13.529666 IP 192.168.1.50.49515 > 169.254.169.254.80: Flags [S], seq 851007280, win 14100, options [mss 1410,sackOK,TS val 78147174 ecr 0,nop,wscale 2], length 0
22:14:13.529699 IP 169.254.169.254.80 > 192.168.1.50.49515: Flags [R.], seq 0, ack 851007281, win 0, length 0

Interesting. So let us take a moment. The SYN from the 3WHS is making it all the way to the necessary Linux Bridge on the compute node. The SYN is then making it to the DHCP agent and its corresponding namespace, but still there is something sendin a RST and preventing the TCP connection to form. We at least know the path is good, but what next? Let us check that the metadata service proxy is actually listening on the required port.

root@infra1-neutron-agents-container-de4c0565:~# ip netns exec qdhcp-423b2b1b-5591-4861-baab-64e9fef84f47 netstat -anp
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 192.168.1.2:53          0.0.0.0:*               LISTEN      1921/dnsmasq
tcp        0      0 169.254.169.254:53      0.0.0.0:*               LISTEN      1921/dnsmasq
tcp6       0      0 fe80::f816:3eff:fe82:53 :::*                    LISTEN      1921/dnsmasq
udp        0      0 192.168.1.2:47830       10.0.3.1:53             ESTABLISHED 1642/tcpdump
udp        0      0 192.168.1.2:53          0.0.0.0:*                           1921/dnsmasq
udp        0      0 169.254.169.254:53      0.0.0.0:*                           1921/dnsmasq
udp        0      0 0.0.0.0:67              0.0.0.0:*                           1921/dnsmasq
udp6       0      0 fe80::f816:3eff:fe82:53 :::*                                1921/

Wait, port 80 isnt listening. Let us restart the services, and recheck.

root@infra1-neutron-agents-container-de4c0565:~# service neutron-dhcp-agent restart
root@infra1-neutron-agents-container-de4c0565:~# service neutron-metadata-agent restart

root@infra1-neutron-agents-container-de4c0565:~# ip netns exec qdhcp-423b2b1b-5591-4861-baab-64e9fef84f47 netstat -anp Active Internet connections (servers and established) Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name tcp 0 0 0.0.0.0:80 0.0.0.0:* LISTEN 1764/python tcp 0 0 192.168.1.2:53 0.0.0.0:* LISTEN 1921/dnsmasq tcp 0 0 169.254.169.254:53 0.0.0.0:* LISTEN 1921/dnsmasq tcp6 0 0 fe80::f816:3eff:fe82:53 :::* LISTEN 1921/dnsmasq udp 0 0 192.168.1.2:47830 10.0.3.1:53 ESTABLISHED 1642/tcpdump udp 0 0 192.168.1.2:53 0.0.0.0:* 1921/dnsmasq udp 0 0 169.254.169.254:53 0.0.0.0:* 1921/dnsmasq udp 0 0 0.0.0.0:67 0.0.0.0:* 1921/dnsmasq udp6 0 0 fe80::f816:3eff:fe82:53 :::* 1921/dnsmasq

Now port 80 is back up, we can now see if the metaservice is accessable from our guest.

Validation

$ curl -v 
> GET / HTTP/1.1
> User-Agent: curl/7.24.0 (x86_64-pc-linux-gnu) libcurl/7.24.0 OpenSSL/1.0.0j zlib/1.2.6
> Host: 169.254.169.254
> Accept: */*
>
< HTTP/1.1 200 OK
< Content-Type: text/plain; charset=UTF-8
< Content-Length: 98
< Date: Wed, 06 Sep 2017 20:32:51 GMT
<
1.0
2007-01-19
2007-03-01
2007-08-29
2007-10-10
2007-12-15
2008-02-01
2008-09-01
2009-04-04
latest$

Great, we are now getting the result as expected. 

Summary

Now, to be fair the fact that we saw a RST being sent back on the compute node did indicate that a port may not have been listening. However, as im sure you will appreciate, the ability to troubleshoot and debug through the various points on an OpenStack system is extremely valuable.

References

[1] “Networking configuration options – OpenStack Documentation.” 12 Jun. 2017, https://docs.openstack.org/mitaka/config-reference/networking/networking_options_reference.html. Accessed 6 Sep. 2017.

Rick Donato

Want to become an OpenStack expert?

Here is our hand-picked selection of the best courses you can find online:
OpenStack Essentials course
Certified OpenStack Administrator course
Docker Mastery course
and our recommended certification practice exams:
AlphaPrep Practice Tests - Free Trial