Introduction
After troubleshooting a recent issue with accessing meta services from a guest instance, and jumping through the various steps within the path, I soon realised – this would make for a great article.
Issue
So first of all let’s look at the issue. The issue was pretty basic. Quite simply I was unable to connect to the metadata service from a nova instance. As shown below:
$ curl -v * couldn't connect to host curl: (7) couldn't connect to host
Troubleshooting
In order to troubleshoot we will step through each part of the path – from guest (upon the compute node) to the Neutron metadata proxy (within the infra nodes and relating LXC) .
Please note that the OpenStack environment within this article is based upon a Linux Bridge OSA (OpenStack Ansible) based deployment.
Guest – Injected Route
As our guest is on an isolated network, i.e a network that is not behind an L3 router. In addition our DHCP agent is configured with enable_isolated_metadata = True[1], meaning our DHCP server will append specific host routes to the DHCP request (via option 121). This results in an additional route being added to the guest, in turn ensuring requests to 169.254.169.254 are routed via the DHCP agent IP.
$ netstat -nr Kernel IP routing table Destination Gateway Genmask Flags MSS Window irtt Iface 0.0.0.0 192.168.1.1 0.0.0.0 UG 0 0 0 eth0 169.254.169.254 192.168.1.2 255.255.255.255 UGH 0 0 0 eth0 192.168.1.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0
As you can see the correct route is in place. Lets ping the address to confirm connectivity:
$ ping 192.168.1.2 -c 2 PING 192.168.1.2 (192.168.1.2): 56 data bytes 64 bytes from 192.168.1.2: seq=0 ttl=64 time=0.768 ms 64 bytes from 192.168.1.2: seq=1 ttl=64 time=0.545 ms This all looks good, lets move on.
This all looks good, lets move on.
Compute – Bridge Packet Trace
Next we will confirm that traffic is making it out from the instance to the bridge. There are a few steps in this process. They are:
Confirm Instance Name
In order to obtain the instance name, we perform a show on the server id. Like so,
[email protected]:~# openstack server list +--------------------------------------+-------------------------+--------+------------------------------------------------------------+------------------------+ | ID | Name | Status | Networks | Image Name | +--------------------------------------+-------------------------+--------+------------------------------------------------------------+------------------------+ | d0cebae2-ba1a-45fc-8f11-d28e8fd93ee3 | cirros-DEMO | ACTIVE | tenant_network_lab001-vsrx=192.168.1.50 | cirros | +--------------------------------------+-------------------------+--------+------------------------------------------------------------+------------------------+ [email protected]:~# openstack server show d0cebae2-ba1a-45fc-8f11-d28e8fd93ee3 +-------------------------------------+----------------------------------------------------------+ | Field | Value | +-------------------------------------+----------------------------------------------------------+ | OS-DCF:diskConfig | MANUAL | | OS-EXT-AZ:availability_zone | nova | | OS-EXT-SRV-ATTR:host | compute03 | | OS-EXT-SRV-ATTR:hypervisor_hostname | compute03 | | OS-EXT-SRV-ATTR:instance_name | instance-0000001a | | OS-EXT-STS:power_state | Running | | OS-EXT-STS:task_state | None | | OS-EXT-STS:vm_state | active | | OS-SRV-USG:launched_at | 2017-09-03T06:19:23.000000 | | OS-SRV-USG:terminated_at | None | | accessIPv4 | | | accessIPv6 | | | addresses | tenant_network_lab001-vsrx=192.168.1.50 | | config_drive | | | created | 2017-09-03T06:19:18Z | | flavor | cirros-small (58772350-942d-442b-9e39-ef953787a2d4) | | hostId | 57f31ea4cecf1bdae04624374a6db8df774b57e94d8427a8c5f1cff1 | | id | d0cebae2-ba1a-45fc-8f11-d28e8fd93ee3 | | image | cirros (5ffaa767-fc80-4d92-90f7-29b6d949797f) | | key_name | None | | name | cirros-DEMO | | progress | 0 | | project_id | f1eb80264e9c4c688c7603bbb5541396 | | properties | | | status | ACTIVE | | updated | 2017-09-03T06:19:23Z | | user_id | 85aea2625c7349fc8796527085a04226 | | volumes_attached | | +-------------------------------------+----------------------------------------------------------+
From the output can you see the field OS-EXT-SRV-ATTR:instance_name and its value – instance-0000001a.
Confirm Bridge Name
Using the instance_name we can then perform a lookup (on the corresponding host) for the bridge. As shown below,
[email protected]:~# virsh domiflist instance-0000001a Interface Type Source Model MAC -------------------------------------------------------------- tap0e6c063c-48 bridge brq423b2b1b-55 virtio fa:16:3e:92:bd:fd
TCPDump
Now we have the bridge name, whilst running another curl from the guest we can perform a tcpdump. From this we can see the SYN is sent, but a RST is returned. So we know the traffic from the guest is making it to the attached bridge, however something is preventing the completion of the TCP connection by sending a RST.
[email protected]:~# tcpdump -ni brq423b2b1b-55 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on brq423b2b1b-55, link-type EN10MB (Ethernet), capture size 262144 bytes 22:02:27.792813 IP 192.168.1.50.49513 > 169.254.169.254.80: Flags [S], seq 2770101210, win 14100, options [mss 1410,sackOK,TS val 77970740 ecr 0,nop,wscale 2], length 0 22:02:27.793369 IP 169.254.169.254.80 > 192.168.1.50.49513: Flags [R.], seq 0, ack 2770101211, win 0, length 0
Controller – Neutron DHCP Namespace
We will now jump into the DHCP namespace. This namespace also hosts the metadata service proxy. Once inside we can confirm if the necessary IPs are configured, and also run another tcpdump to see if the traffic is making it in.
First things first, upon the infra node we connect to the neutron_agents_container. We confirm its full name by running lxc-ls -f.
lxc-attach -n infra1_neutron_agents_container-de4c0565
Once attached, we list the Linux namespaces using the following command:
[email protected]:~# ip netns list qdhcp-d5b08c40-3b9e-4d6e-872b-7e8ce1553703 (id: 5) qrouter-215981d1-32bc-41fd-a13a-27dbe7ecb63f (id: 4) qdhcp-579d7378-84ad-45d5-882d-e052bab9595e (id: 3) qdhcp-33ad9c28-5884-41e1-97c0-05886740b5c1 (id: 1) qdhcp-423b2b1b-5591-4861-baab-64e9fef84f47 (id: 2)
Based upon our previous bridge name brq423b2b1b-55 we can correlate this to the name space – qdhcp-423b2b1b-5591-4861-baab-64e9fef84f47.
With this namespace identifer we can run commands against the namespace, in order to help us with our troubleshooting. First, let us look at the IP address, this should be 192.168.1.2.
[email protected]:~# ip netns exec qdhcp-423b2b1b-5591-4861-baab-64e9fef84f47 ifconfig lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:65536 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1 RX bytes:0 (0.0 B) TX bytes:0 (0.0 B) ns-e2e1c16d-6e Link encap:Ethernet HWaddr fa:16:3e:82:97:6e inet addr:192.168.1.2 Bcast:192.168.1.255 Mask:255.255.255.0 inet6 addr: fe80::f816:3eff:fe82:976e/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1450 Metric:1 RX packets:160 errors:0 dropped:0 overruns:0 frame:0 TX packets:97 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:12228 (12.2 KB) TX bytes:7988 (7.9 KB)
Well, the IP looks correct. Lets run another tcpdump, whilst running a curl on our guest.
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on any, link-type LINUX_SLL (Linux cooked), capture size 262144 bytes 22:14:13.529666 IP 192.168.1.50.49515 > 169.254.169.254.80: Flags [S], seq 851007280, win 14100, options [mss 1410,sackOK,TS val 78147174 ecr 0,nop,wscale 2], length 0 22:14:13.529699 IP 169.254.169.254.80 > 192.168.1.50.49515: Flags [R.], seq 0, ack 851007281, win 0, length 0
Interesting. So let us take a moment. The SYN from the 3WHS is making it all the way to the necessary Linux Bridge on the compute node. The SYN is then making it to the DHCP agent and its corresponding namespace, but still there is something sendin a RST and preventing the TCP connection to form. We at least know the path is good, but what next? Let us check that the metadata service proxy is actually listening on the required port.
[email protected]:~# ip netns exec qdhcp-423b2b1b-5591-4861-baab-64e9fef84f47 netstat -anp Active Internet connections (servers and established) Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name tcp 0 0 192.168.1.2:53 0.0.0.0:* LISTEN 1921/dnsmasq tcp 0 0 169.254.169.254:53 0.0.0.0:* LISTEN 1921/dnsmasq tcp6 0 0 fe80::f816:3eff:fe82:53 :::* LISTEN 1921/dnsmasq udp 0 0 192.168.1.2:47830 10.0.3.1:53 ESTABLISHED 1642/tcpdump udp 0 0 192.168.1.2:53 0.0.0.0:* 1921/dnsmasq udp 0 0 169.254.169.254:53 0.0.0.0:* 1921/dnsmasq udp 0 0 0.0.0.0:67 0.0.0.0:* 1921/dnsmasq udp6 0 0 fe80::f816:3eff:fe82:53 :::* 1921/
Wait, port 80 isnt listening. Let us restart the services, and recheck.
[email protected]:~# service neutron-dhcp-agent restart [email protected]:~# service neutron-metadata-agent restart
[email protected]:~# ip netns exec qdhcp-423b2b1b-5591-4861-baab-64e9fef84f47 netstat -anp Active Internet connections (servers and established) Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name tcp 0 0 0.0.0.0:80 0.0.0.0:* LISTEN 1764/python tcp 0 0 192.168.1.2:53 0.0.0.0:* LISTEN 1921/dnsmasq tcp 0 0 169.254.169.254:53 0.0.0.0:* LISTEN 1921/dnsmasq tcp6 0 0 fe80::f816:3eff:fe82:53 :::* LISTEN 1921/dnsmasq udp 0 0 192.168.1.2:47830 10.0.3.1:53 ESTABLISHED 1642/tcpdump udp 0 0 192.168.1.2:53 0.0.0.0:* 1921/dnsmasq udp 0 0 169.254.169.254:53 0.0.0.0:* 1921/dnsmasq udp 0 0 0.0.0.0:67 0.0.0.0:* 1921/dnsmasq udp6 0 0 fe80::f816:3eff:fe82:53 :::* 1921/dnsmasq
Now port 80 is back up, we can now see if the metaservice is accessable from our guest.
Validation
$ curl -v > GET / HTTP/1.1 > User-Agent: curl/7.24.0 (x86_64-pc-linux-gnu) libcurl/7.24.0 OpenSSL/1.0.0j zlib/1.2.6 > Host: 169.254.169.254 > Accept: */* > < HTTP/1.1 200 OK < Content-Type: text/plain; charset=UTF-8 < Content-Length: 98 < Date: Wed, 06 Sep 2017 20:32:51 GMT < 1.0 2007-01-19 2007-03-01 2007-08-29 2007-10-10 2007-12-15 2008-02-01 2008-09-01 2009-04-04 latest$
Great, we are now getting the result as expected.
Summary
Now, to be fair the fact that we saw a RST being sent back on the compute node did indicate that a port may not have been listening. However, as im sure you will appreciate, the ability to troubleshoot and debug through the various points on an OpenStack system is extremely valuable.
References
[1] “Networking configuration options – OpenStack Documentation.” 12 Jun. 2017, https://docs.openstack.org/mitaka/config-reference/networking/networking_options_reference.html. Accessed 6 Sep. 2017.
- How to Configure a BIND Server on Ubuntu - March 15, 2018
- What is a BGP Confederation? - March 6, 2018
- Cisco – What is BGP ORF (Outbound Route Filtering)? - March 5, 2018
Want to become an OpenStack expert?
Here is our hand-picked selection of the best courses you can find online:
OpenStack Essentials course
Certified OpenStack Administrator course
Docker Mastery course
and our recommended certification practice exams:
AlphaPrep Practice Tests - Free Trial