Background
My lab environment is supported by a KVM virtual network at each site. It includes four datacenters, more than a handful of physical machines and tens of KVM based virtual machines.
Recently, I discovered that libvirtd does some interesting things with iptables FORWARD rules. When a NAT KVM virtual network is added, the following rules will be added to the beginning of the FORWARD chain in iptables:
ACCEPT all -- anywhere 192.168.100.0/24 state RELATED,ESTABLISHED
ACCEPT all -- 192.168.100.0/24 anywhere
ACCEPT all -- anywhere anywhere
REJECT all -- anywhere anywhere reject-with icmp-port-unreachable
REJECT all -- anywhere anywhere reject-with icmp-port-unreachable
When a bridged network is added, the following rules will be added to the beginning of the FORWARD chain in iptables:
ACCEPT all -- anywhere 192.168.124.0/24
ACCEPT all -- 192.168.124.0/24 anywhere
ACCEPT all -- anywhere anywhere
REJECT all -- anywhere anywhere reject-with icmp-port-unreachable
REJECT all -- anywhere anywhere reject-with icmp-port-unreachable
Research
What makes this discovery so strange is that it has been set up for nearly a year when I finally figured out what was happening. For example, every now and again, the Nagios instance at DC3 would page that virtual machines at a DC1 were not accessible. Like a good lazy sysadmin, I would just restart iptables on the DC1 router and everything would start working again for another week or so.
Quite honestly, I had never given it too much thought, but recently something must have changed in RHEL6 because libvirtd adds these rules back much quicker than before. I started to get paged everyday, which was just too much.
The fact that my lab setup is fairly a-typical did not help the search for the cause of the problem. After several hours of experimenting with iptables and searching, I found the following Bugzilla entry explaining the problem. My next search was to find a way to make libvirtd stop adding these rules, to no avail, there is no way to stop this short of modifying the source code.
My initial reaction was to open another Bugzilla entry for RHEL asking to have this changed. After some thought and time to accept my dilemma, I realized that the same capability could be achieved by splitting the single NAT network into two. One would be a NAT network to allow the virtual machines to access the Internet. The second network would be a routed, BRIDGE network to access other machines on the crunchtools internal network, but would require one route rule to be added to the machines behind the routers at each site.
Conclusion
The solution required two networks and a routing rule for each machine behind the RHEL routers at each data center. These were added with virt-manager, but could be changed manually with virsh and the following xml.
The default KVM virtual network provides access to the Internet
The crunchtools KVM virtual network provides access, through openvpn, to the other network segments of the crunchtools lab environment.
crunchtools
ec860311-21b2-6acc-6d88-6cc3b00e460a
Then on each machine behind the router, add the following route
/etc/sysconfig/network-scripts/route-eth0
192.168.0.0/16 via 192.168.124.1 dev eth0
I am not extremely happy with having custom routes at each of the sites because it complicates my core build, but for now it is the only sane way to configure the network. For a short time, I did entertain the thought of just removing the reject line from the routers in a while loop, but decided that was too much of a hack.
iptables -D FORWARD -o virbr0 -j REJECT --reject-with icmp-port-unreachable