problem
Trying to use a shared network with private vlans and the VMs were unable to ping their default gateway in testing. On investigation this turns out to be an issue with the OVS tables.
Primary vlan - 301
Isolated vlan - 302
The hypervisors use cloudbr1 which is trunk link for all guest + public traffic. All hypervisors have both vlans (301 and 302) allowed on that trunk link.
The primary vlan's L3 interface is an SVI on a Cisco switch. The Cisco switch has the correct private vlan configuration. The hypervisors can ping the L3 interface IP.
When an instance is created in this shared network it successfully gets an IP but is unable to ping the L3 interface IP. The Cisco switch shows the correct mac address of the VM NIC in it's arp table so communication to the switch is working but the return traffic is getting dropped by the OVS switch.
Looking at the OVS flow rules on the hypervisor running the VM (with some help from ai) -
root@hv2:~# ovs-ofctl dump-flows cloudbr1 | grep "dl_vlan=301"
cookie=0x0, duration=680.753s, table=0, n_packets=635, n_bytes=40640, idle_age=0, priority=200,arp,dl_vlan=301,arp_tpa=172.30.12.70 actions=strip_vlan,resubmit(,1)
cookie=0x0, duration=680.739s, table=0, n_packets=2, n_bytes=692, idle_age=659, priority=100,udp,dl_vlan=301,nw_dst=255.255.255.255,tp_dst=67 actions=strip_vlan,resubmit(,1)
cookie=0x0, duration=680.667s, table=0, n_packets=0, n_bytes=0, idle_age=680, priority=70,dl_vlan=301,dl_dst=1e:01:26:00:00:c2 actions=strip_vlan,resubmit(,1)
cookie=0x0, duration=674.094s, table=0, n_packets=105, n_bytes=6720, idle_age=21, priority=70,dl_vlan=301,dl_dst=1e:01:9d:00:00:e8 actions=strip_vlan,resubmit(,1)
cookie=0x0, duration=673.872s, table=0, n_packets=731, n_bytes=46784, idle_age=0, priority=70,dl_vlan=301,dl_dst=ff:ff:ff:ff:ff:ff actions=strip_vlan,group:301
cookie=0x0, duration=680.660s, table=1, n_packets=0, n_bytes=0, idle_age=680, priority=70,dl_vlan=301,dl_dst=1e:01:26:00:00:c2 actions=strip_vlan,output:42
cookie=0x0, duration=674.088s, table=1, n_packets=0, n_bytes=0, idle_age=674, priority=70,dl_vlan=301,dl_dst=1e:01:9d:00:00:e8 actions=strip_vlan,output:43
cookie=0x0, duration=673.865s, table=1, n_packets=0, n_bytes=0, idle_age=680, priority=70,dl_vlan=301,dl_dst=ff:ff:ff:ff:ff:ff actions=strip_vlan,group:4397
relevant lines -
cookie=0x0, duration=674.094s, table=0, n_packets=105, n_bytes=6720, idle_age=21, priority=70,dl_vlan=301,dl_dst=1e:01:9d:00:00:e8 actions=strip_vlan,resubmit(,1)
cookie=0x0, duration=674.088s, table=1, n_packets=0, n_bytes=0, idle_age=674, priority=70,dl_vlan=301,dl_dst=1e:01:9d:00:00:e8 actions=strip_vlan,output:43
table 0 has 105 matches but it is stripping the vlan 301 so when it gets to table 1 there are no matches and the default action at the end is to drop.
The solution was to add this to test -
ovs-ofctl add-flow cloudbr1 "table=1,priority=80,dl_dst=1e:01:9d:00:00:e8,actions=output:43" to match the untagged packet and send it to the VM.
Once this was added the VM could now ping it's default gateway on the Cisco switch.
But this was just a quick fix to test if that was the issue. The suggestion is that there is an issue with the matching or rewrite rules that need modifying in the code.
versions
Cloudstack 4.22.1.0, hypervisors running Ubuntu 22.04, NFS primary and secondary storage
OVS switch version - OVS version : 2.17.9
The steps to reproduce the bug
- Create a shared network with a primary and isolated vlan.
- Create an instance in the new shared network
- Try to ping the default gateway
...
What to do about it?
No response
problem
Trying to use a shared network with private vlans and the VMs were unable to ping their default gateway in testing. On investigation this turns out to be an issue with the OVS tables.
Primary vlan - 301
Isolated vlan - 302
The hypervisors use cloudbr1 which is trunk link for all guest + public traffic. All hypervisors have both vlans (301 and 302) allowed on that trunk link.
The primary vlan's L3 interface is an SVI on a Cisco switch. The Cisco switch has the correct private vlan configuration. The hypervisors can ping the L3 interface IP.
When an instance is created in this shared network it successfully gets an IP but is unable to ping the L3 interface IP. The Cisco switch shows the correct mac address of the VM NIC in it's arp table so communication to the switch is working but the return traffic is getting dropped by the OVS switch.
Looking at the OVS flow rules on the hypervisor running the VM (with some help from ai) -
root@hv2:~# ovs-ofctl dump-flows cloudbr1 | grep "dl_vlan=301"
cookie=0x0, duration=680.753s, table=0, n_packets=635, n_bytes=40640, idle_age=0, priority=200,arp,dl_vlan=301,arp_tpa=172.30.12.70 actions=strip_vlan,resubmit(,1)
cookie=0x0, duration=680.739s, table=0, n_packets=2, n_bytes=692, idle_age=659, priority=100,udp,dl_vlan=301,nw_dst=255.255.255.255,tp_dst=67 actions=strip_vlan,resubmit(,1)
cookie=0x0, duration=680.667s, table=0, n_packets=0, n_bytes=0, idle_age=680, priority=70,dl_vlan=301,dl_dst=1e:01:26:00:00:c2 actions=strip_vlan,resubmit(,1)
cookie=0x0, duration=674.094s, table=0, n_packets=105, n_bytes=6720, idle_age=21, priority=70,dl_vlan=301,dl_dst=1e:01:9d:00:00:e8 actions=strip_vlan,resubmit(,1)
cookie=0x0, duration=673.872s, table=0, n_packets=731, n_bytes=46784, idle_age=0, priority=70,dl_vlan=301,dl_dst=ff:ff:ff:ff:ff:ff actions=strip_vlan,group:301
cookie=0x0, duration=680.660s, table=1, n_packets=0, n_bytes=0, idle_age=680, priority=70,dl_vlan=301,dl_dst=1e:01:26:00:00:c2 actions=strip_vlan,output:42
cookie=0x0, duration=674.088s, table=1, n_packets=0, n_bytes=0, idle_age=674, priority=70,dl_vlan=301,dl_dst=1e:01:9d:00:00:e8 actions=strip_vlan,output:43
cookie=0x0, duration=673.865s, table=1, n_packets=0, n_bytes=0, idle_age=680, priority=70,dl_vlan=301,dl_dst=ff:ff:ff:ff:ff:ff actions=strip_vlan,group:4397
relevant lines -
cookie=0x0, duration=674.094s, table=0, n_packets=105, n_bytes=6720, idle_age=21, priority=70,dl_vlan=301,dl_dst=1e:01:9d:00:00:e8 actions=strip_vlan,resubmit(,1)
cookie=0x0, duration=674.088s, table=1, n_packets=0, n_bytes=0, idle_age=674, priority=70,dl_vlan=301,dl_dst=1e:01:9d:00:00:e8 actions=strip_vlan,output:43
table 0 has 105 matches but it is stripping the vlan 301 so when it gets to table 1 there are no matches and the default action at the end is to drop.
The solution was to add this to test -
ovs-ofctl add-flow cloudbr1 "table=1,priority=80,dl_dst=1e:01:9d:00:00:e8,actions=output:43" to match the untagged packet and send it to the VM.
Once this was added the VM could now ping it's default gateway on the Cisco switch.
But this was just a quick fix to test if that was the issue. The suggestion is that there is an issue with the matching or rewrite rules that need modifying in the code.
versions
Cloudstack 4.22.1.0, hypervisors running Ubuntu 22.04, NFS primary and secondary storage
OVS switch version - OVS version : 2.17.9
The steps to reproduce the bug
...
What to do about it?
No response