from /SateLites/MtuIssues --- on moinmoin convert for zope
I would submit a strange observation we succeed to repeat to the community.
We're working since a long time (more then 30 months) with/on Locust distro and we've built like some others our own meshbox with specifications related to our country, but we worked with several different hardware architectures before finding the right components.
We've deployed several networks in experimentation or production on some differents networks collectors type (Satellite, DSL, cable...) with always the same network architecture but with some hardware variations on some nodes.
We always noticed the same problem whatever the network collector type is and whatever the hardware embeded in the box (regular one: MINI-ITX, or exotic ones (Compaq Serie SFF, soekris...)
here's the redundant problem: When an uplink meshbox is directly connected to a SAT or DSL modem in dhcp relay mode (through a switch or in a back2back mode) we observed always a loss of bandwith and xfer on repeater nodes tunneled with the uplink. Tcpdump is showing offest in the TCP sequences when frames are coming out the AES tunnel between uplink and repeater. It is resulting in a problem to reassemble tcp sequences and it is cazusing then a bandwith loss.
the only way we found to correct this problem is to put a firewall (eg IPCOP) between the modem and the mesh. Which means we're always natting our uplink mesh (which is on a "DMZ" then).
I would like to know if some of you observe the same problem or if some of the community had ever tried a such network archi:
1 - TCP Problem observed: loss of bandwith, framing sequence pb on repeaters, uplink okay
-Uplink directly connected to a modem acting as DHCP relay providing Public IP -repeater nodes wirelessly connected to uplink
2 - TCP problem fixed: all repeaters and uplink have full BW. TCP sequencing okay
-Modem connected on a "IPchains box" (say IPCOP eg) in eth (WAN or collector side) -Uplink directly or through a switch connected on the LAN of IPCOP -repeater nodes wirelessly connected to uplink
Any comments are welcome on this
document lates Edit
If what Brendan is suggesting is true then you could use a Mikrotik router (demo mode should do it) to Mangle the packets to the required MTU see www.mikrotik.com
> this is almost certainly an MTU issue. > For clients not connected directly to the gateway node you may see > problems (I did anyway ) with certain sat systems (Direcway 4020 in my > case which has an underlying MTU of 580) > > LW Knows about it for at least 6 months but has yet to resolve the > issue, don't hold your breath ! > > try setting the MTU on the gateway eth0 port to 580 and see if the > problem goes away (NOTE MTU of 580 may not be correct for your > particular setup) > > if that fixes it add the following line to the end of >
* /etc/init.d/rc.local ifconfig eth0 mtu 580
> you only need to do this on the gateway > > We have a customer who is trying to access one of his suppliers web > > sites in order to order stock. Half way through the order he > > goes to a specific page and the page fails to successfully > > load. We have successfully accessed the page from a PC between > > our gateway node and the satellite link, and we can > > successfully access the site from an PC sat on the ethernet > > port of a meshbox (non gateway), however we can not > > successfully access the site if we connect to the same mesh > > box via a wireless ethernet bridge. > > > > My past experience , not with wireless, would suggest an MTU > > problem.
> We have a customer who is trying to access one of his suppliers web > sites in order to order stock. Half way through the order he > goes to a specific page and the page fails to successfully > load. We have successfully accessed the page from a PC between > our gateway node and the satellite link, and we can > successfully access the site from an PC sat on the ethernet > port of a meshbox (non gateway), however we can not > successfully access the site if we connect to the same mesh > box via a wireless ethernet bridge. > > My past experience , not with wireless, would suggest an MTU > problem. > > > Comments and suggestions please.
we have several issues of those kind in our deployments....
we investigate a lot and are now ready to shhare this exoperioence surely as we succeeded in reprioduce the issues and fix them.
Observation: 1 - A gateway node directly connected in back2back eth with a sat modem and 3 repeaters linked wirelessly to the gateway. MTU hasn't been changed from the default
The client A connected wirelessly to a repeater coulnd't surf very well . Tcp connection seems to be reseted and dropped or disconnected Client B is directly connected to the gateway in wireless. He has no issue to surf. all transfers are ok. leechtest from repeater nodes are failing in 90% of the time after transfering several kb. leechtest on gateway node are okay.
Investigations: A datagrams analysis on the gateway , repeater and clients are showing a fragmentation problem whit client A. However no fragmentation problem with client B... In looking deeper, we saw the gateway was loosing ipfrag_time bits in datagrams when natting from wireless tunnels to the eth. Deeper again we understood the problem was in ipfrag bits stacks and NAT + TUNNELS. When Client A (connected in wireless repeater) is surfing on the net, his datagrams are encapsulated in TUN0 interface with a MTU change from 1500 (Wlan0 interface) to 1450 (TUN0), then going through the Tunnel toward the gateway. At this time datagrams is supposed to be translated in the eth0 address (NAT rules) of the gateway and then changing the MTU from 1450 to 1500...
We noticed then when we add a passive or active network equipment between the gateway and the modem (eg: hub, switch, router or a Linux IPCop distro) it fixes the problem. Which means there is definitivly a problem with NAT/fragmentation causing some MTU problems on a gateway node directly connected in a Sat modem or in back to back eth connection eg: (MESHNODES <-- wireless links --> MESH GATEWAY <--eth cross-cable back2back--> MODEM) ......
Jérémy Lacroix -- Linux-Services Edit
Déploiement réseaux sans fils / logiciels libres / création web tel: 0870 21 4312 -- http://www.linux-services.fr/
> "Wireless" > > What I fail to see with Bredons suggestion, but I've not tried it yet, > is why the mtu on the gateway node would effect what is happening > on a remote node, and why on the remote node its works on the ethernet > but for wireless clients, unless there is some pmtu discovery stuff at > work. But as I said this does look like an mtu problem, although I > would have thought that if a wireless mtu problem existed we would see > it on intermesh boxes links because of the overhead of the tunnels and > encryption.
Whilst I agree that it does not seem entirely logical that this should fix the issue, in my setup it did, I think that information concerning fragmentation isn't always getting passed back down the mesh and there may indeed be some PMTU issues.
Since details of what happens in the mesh and how the various locustworld scripts configure everything is not well documented it's hard to make a complete diagnosis.
I don't see why one would need to add a router (or why this would automatically help ) since the MTU can be set at the gateway, manually for testing then automatically on boot by adding the line
* ifconfig eth0 mtu 580 * to the end of * /etc/init.d/rc.local
this 'fix' will be broken any time you update the version using getandverify Why, if it's not a LWmesh issue do I not have any problems with my Staros links off this system or over my wired ethernet one other issue for us poor satellite victims is the need for the DNSchains to be set up so that we are using the Sat provider's proxyed DNS server instead of the locustworld one
iwcconfig frag Edit
iwconfig wlan0 frag off
turns frag off. this is default and leave it as it is if link SNR is good.
iwconfig wlan0 frag nnn
sets frag to value nnn (max is 2346) for noisy environment set it to 500 which is also done by Wiana. You can play with other values such as 1000, 1500...
These settings are overwritten after rebooting and returns to wiana settings.
Why is your system so slow? Due to mesh or T1?
> how do you change this value and what do you set it to?? my connection is > so > slow and i accidently closed the page and it wont open again. my email is > barely working. Fortunely i can still putty if this can be done by command > line. I dont know how to change anything using vi.
>> Two recent bug reports about performnace >> >> http://cvs.locustworld.com:8088/locustworld/tktview?tn=89,1 >> >> http://cvs.locustworld.com:8088/locustworld/tktview?tn=87,1 >> >> also could explain some of your probs
>> > Further to my previous message, just this morning one of my nodes >> > failed >> > to report. Its radio seemed dead with no signal. When we climb up to >> > the >> > attic to see that PCI riser caused the problem. We have re-inserted the >> > riser and problem gone away. It will certainly fail again unless we > change >> > it with a new one but I do not have one on my hand. >> > >> > Normally we change the cases now and do not use PCI risers anymore. We >> > directly fit radio cards into the slot. Also we use DoM flash instead >> > of >> > Cf and get rid off cables and troubles. We use melted silicon glue > around >> > PCI slot to prevent corrosion caused by weather. RAM slots also a >> > source >> > of problem due to corrosion by time if weather humidity is high. }}}