No change in 3.10.4. Wan IPv6 address has gone AWOL again.
Will check that and will get back to you.
After yesterdayâs update my router start sending dhcp6 renew in a loop, so init7 had to turn ipv6 for me off, to save their DHCP server. Are there any way to fix it without compiling and installing custom packages?
# tcpdump -n -i eth1 -vv '(udp port 546 or 547) or icmp6'
10:21:22.126283 IP6 (flowlabel 0x9ed25, hlim 64, next-header UDP (17) payload length: 184) 2a02:168:2000:9:da58:d7ff:fe00:50a4.546 > 2001:1620:2777:19:1::9.547: [udp sum ok] dhcp6 renew (xid=71aa0d (elapsed-time 100) (option-request SIP-servers-domain SIP-servers-address DNS-server DNS-search-list server-unicast SNTP-servers NTP-server AFTR-Name opt_67 opt_82 opt_83 opt_94 opt_95 opt_96) (client-ID hwaddr type 1 d858d70050a4) (server-ID hwaddr/time type 1 time 574856739 525400faac14) (Client-FQDN) (IA_NA IAID:1 T1:0 T2:0 (IA_ADDR 2a02:168:2000:9:a2d3:42ec:f1e:49f5 pltime:0 vltime:0)) (IA_PD IAID:1 T1:0 T2:0 (IA_PD-prefix 2a02:168:1234::/48 pltime:0 vltime:0)))
10:21:22.127273 IP6 (flowlabel 0x6ef01, hlim 1, next-header UDP (17) payload length: 166) fe80::da58:d7ff:fe00:50a4.546 > ff02::1:2.547: [udp sum ok] dhcp6 rebind (xid=d95e41 (elapsed-time 0) (option-request SIP-servers-domain SIP-servers-address DNS-server DNS-search-list server-unicast SNTP-servers NTP-server AFTR-Name opt_67 opt_82 opt_83 opt_94 opt_95 opt_96) (client-ID hwaddr type 1 d858d70050a4) (Client-FQDN) (IA_NA IAID:1 T1:0 T2:0 (IA_ADDR 2a02:168:2000:9:a2d3:42ec:f1e:49f5 pltime:0 vltime:0)) (IA_PD IAID:1 T1:0 T2:0 (IA_PD-prefix 2a02:168:1234::/48 pltime:0 vltime:0)))
10:21:22.327224 IP6 (class 0xe0, hlim 255, next-header UDP (17) payload length: 198) fe80::ca9c:1dff:fe93:343f.547 > fe80::da58:d7ff:fe00:50a4.546: [udp sum ok] dhcp6 reply (xid=d95e41 (IA_NA IAID:1 T1:1200 T2:1800 (IA_ADDR 2a02:168:2000:9:a2d3:42ec:f1e:49f5 pltime:3600 vltime:86400) (status-code Success)) (IA_PD IAID:1 T1:0 T2:0) (server-ID hwaddr/time type 1 time 574856739 525400faac14) (client-ID hwaddr type 1 d858d70050a4) (preference 0) (server-unicast) (DNS-server 2001:1620:2777:1::10 2001:1620:2777:2::20))
10:21:22.328710 IP6 (flowlabel 0x9ed25, hlim 64, next-header UDP (17) payload length: 184) 2a02:168:2000:9:da58:d7ff:fe00:50a4.546 > 2001:1620:2777:19:1::9.547: [udp sum ok] dhcp6 renew (xid=8d7ca2 (elapsed-time 0) (option-request SIP-servers-domain SIP-servers-address DNS-server DNS-search-list server-unicast SNTP-servers NTP-server AFTR-Name opt_67 opt_82 opt_83 opt_94 opt_95 opt_96) (client-ID hwaddr type 1 d858d70050a4) (server-ID hwaddr/time type 1 time 574856739 525400faac14) (Client-FQDN) (IA_NA IAID:1 T1:0 T2:0 (IA_ADDR 2a02:168:2000:9:a2d3:42ec:f1e:49f5 pltime:0 vltime:0)) (IA_PD IAID:1 T1:0 T2:0 (IA_PD-prefix 2a02:168:1234::/48 pltime:0 vltime:0)))
10:21:22.812365 IP6 (flowlabel 0x29b9e, hlim 64, next-header ICMPv6 (58) payload length: 113) 2a02:168:2000:9:da58:d7ff:fe00:50a4 > 2001:1620:2777:1::10: [icmp6 sum ok] ICMP6, destination unreachable, unreachable port, 2a02:168:2000:9:da58:d7ff:fe00:50a4 udp port 46499
10:21:23.336383 IP6 (flowlabel 0x9ed25, hlim 64, next-header UDP (17) payload length: 184) 2a02:168:2000:9:da58:d7ff:fe00:50a4.546 > 2001:1620:2777:19:1::9.547: [udp sum ok] dhcp6 renew (xid=8d7ca2 (elapsed-time 100) (option-request SIP-servers-domain SIP-servers-address DNS-server DNS-search-list server-unicast SNTP-servers NTP-server AFTR-Name opt_67 opt_82 opt_83 opt_94 opt_95 opt_96) (client-ID hwaddr type 1 d858d70050a4) (server-ID hwaddr/time type 1 time 574856739 525400faac14) (Client-FQDN) (IA_NA IAID:1 T1:0 T2:0 (IA_ADDR 2a02:168:2000:9:a2d3:42ec:f1e:49f5 pltime:0 vltime:0)) (IA_PD IAID:1 T1:0 T2:0 (IA_PD-prefix 2a02:168:1234::/48 pltime:0 vltime:0)))
Hello everyone
We are Init7 and have been observing the topic for some time thanks to Michael. @sECuRE
As the project to harmonize our DHCP infrastructure will take some time, we are interested in a temporary solution for our customers. Since yesterdayâs update, the situation has worsened dramatically and we have seen numerous Turris Omnia routers flooding our servers with hundreds of thousands of requests. For this reason, we have already had to deactivate IPv6 for a large number of customers.
Basically we are aware of the problem, but unfortunately we cannot provide a short-term solution on our DHCP servers. Recently, attention was drawn to the following workaround, but do not know if it solves the situation completely:
https://blog.printk.io/2018/08/ipv6-renew-issue-with-fiber7-and-openwrt/
We are open to discussion on this subject.
Kind regards,
Init7 NOC
^dw
Unfortunately that solution doesnât work because package odhcp6c (2018-06-20-12) has no support of -U parameter:
odhcp6c -U -s /lib/netifd/dhcpv6.script -Ntry -P0 -t120 eth1
odhcp6c: unrecognized option: U
Usage: odhcp6c [options] <interface>
Can it be related to very old ticket https://gitlab.labs.nic.cz/turris/openwrt/issues/182 ?
Weâre working on it.
Pepe,
could you say how to revert odhcp6c to previous version? Iâm afraid that Iâll have no connectivity for the weekend.
Sent PM.
// 20 characters
Hello guys,
I am sorry it has taken me so long to respond to your query. Weâve been discussing how we can help you in your situation and we decided to release Turris OS 3.10.5 to RC at the beginning of the next week.
In the upcoming release, there has been updated odhcp6c to the latest version including the option to ignore Server Unicast option. Weâd like to thank you to our user @koalatux, who bring the option to odhcp6c! Weâre really glad that we have such an amazing community, which can debug it and send a pull request to get it fixed in upstream.
Recently, we updated odhcp6c, which added multicast option. Thatâs why we decided to disable unicast support by default as it is causing problems in some networks. The unicast support can be enabled in the configuration file /etc/config/network. Itâll be included in release notes and the support team (including me) is ready to help our users, who would like to enable it.
I have very good news for experienced users with CLI and SSH.
If somebody would like to try RC before weâll release it, you can do it with the following command:
switch-branch --force stable
Earlier I was in touch with @yorik once you switch to the stable branch, youâll need to ask Init7 to re-enable IPv6.
Weâd really appreciate the feedback if that works better for you.
Greetings from Prague,
Pepe
Hi all
Something seems to be broke since the Turris update on Wednesday.
Yesterday evening I ran tcpdump again and I saw a flood of DHCPv6 messages of types renew and reply. Iâve seen about one renew per second (but sometimes with pauses of tenths of minutes). This kind of matches what @Init7 described earlier in this thread.
I still have my patched version of odhcp6c (with the -U
parameter) running so the renew messages were sent to the multicast address. Because of that I assume the -U
/ noserverunicast
wonât fix the problem of the flooding, this must be something unrelated.
I just wanted to share this. I donât have time for debugging this weekend and currently I also donât get replies to DHCPv6 Solicit messages from Init7âs DHCPv6-Server anymore.
Cheers, Adi
EDIT: I forgot to mention, the DHCPv6 reply messages I captured with tcpdump looked correct. This made me conclude there must be an error at Turris.
Hrm. I think I understand the DoSing failure mode, at least somewhat.
TL;DR: This might happen after an assigned IA_PD-prefix expired its T1 and T2, until it actually expires its validity,
if the server stops returning that prefix (but doesnât actively revoke it), while still providing IA_NA addresses.
Note the âelapsed-time 100â on the renew retransmission (aka 1s, the option is in centiseconds). odhcp6c seems to have an odd habit of sending one final retransmission AT the final operation timeout and then immediately giving up. The initial timeout for the first renew transmission is normally 10s ±rand(0.1s), so this is being cut short by the overall timeout.
Sifting code, this means T2 = T1 + 1. In this case, I think T2=1, T1=0. Weâd get there from dhcpv6.c:1177 - that branch is taken if we just parsed a REBIND response, T1 and T2 were both 0, and sets T2=1.
By my reading, dhcpv6_calc_refresh_timers() will quite happily pull T1=T2=0 out of a stored address, and odhcp6c_expire_list() will just tick those down to 0 over time if nothing refreshes them.
This puts us into the observed failure state of REBIND/REPLY/RENEW/RENEW/REBIND.
Doesnât seem particularly intended, overall.
Iâm not familiar with the precise standard here, but it seems odd that odhcp6c just hangs on to all configured addresses it was ever told about, even after repeated responses from the server no longer mention them, though maybe thatâs intended - when we last got it the address was valid, it hasnât expired, and nobody told us otherwise.
The behavior of expiring T1/T2 to 0 doesnât seem to be intended, though.
I think dhcpv6_calc_refresh_timers() should probably ignore values of 0 in its min(), only yielding 0 if ALL addresses are now 0, but Iâm not confident in that - we might need to get dedeckeh to weigh in.
I donât think I fully understand that updated_IAs interaction, either - I get that we yield failure if no IAs changed and weâre stuck at T1=T2=0, but some IAs changing doesnât mean all of them did. Perhaps there needs to be code here where we re-up T1/T2 on entries to some reasonable value if they werenât mentioned in the response, perhaps max(1200, 0.5*preferred) or so - what should we do when preferred expires but valid hasnât yet? The standard seems thin on this, but I may have missed something.
The best I got is that the standard wants clients to maintain T1/T2 per IA, as in, per set of addresses returned aka per âsessionâ, not per individual address. Thus, after the server has responded with a new T1/T2 on some addresses, the client should use those T1/T2, even if it also has other addresses it received in the past that werenât mentioned in the response.
So instead of invoking dhcpv6_calc_refresh_timers(), we might want dhcpv6_handle_reply() to track the smallest T1/T2 effective on any IA_NA/IA_PD address it received in this reply, and use those values.
@Init7 - Iâm not sure what to offer as workaround here.
If you can, the simplest would be ensuring that prefixes stay bound for 24h, so that you can always reply to REBIND by refreshing the requested prefixes and they donât run out. May mean that servers who canât reach the assigned-PD database canât send responses.
If you have control over server code, implementing the REBIND bit of https://tools.ietf.org/html/rfc3633#section-12.2 would help: If the client mentioned a prefix in its IA_PD option on renew, but the server doesnât want it to use it anymore, reply with an IA_PD-prefix option mentioning that prefix with vltime=0. This would correctly cause odhcp6c to immediately discard that prefix, stopping it from depressing T1/T2.
If you canât touch your dhcp server, the best I can come up with is a packet-inspecting filter that discards REBIND requests with an elapsed-time option < 6000, ensuring that clients donât get a REBIND response within the first minute. They should keep retransmitting, eventually sending a request marked old enough to make it through, and timeouts should be high enough that 60s isnât a huge deal. A bit of a dirty move, but it might slow down the flood enough that your servers survive, as the speed of the loop depends on how fast clients get a response to REBIND.
Edit: After writing this, I realized a very similar failure mode exists where T1 hits 0 but T2 hasnât yet.
In that case, the client will send a RENEW with normal timeout (T2-T1 == T2-1), upon receiving a response set T1=1 and restart the stateful loop: Wait 1s (T1) for RECONFIGURE, then immediately send RENEW and normal retransmissions thereof, looping on response - again, a request spam of cycle time 1s + renew_rtt.
So unfortunately, the workaround of filtering low-age REBIND isnât sufficient, youâd also need to filter low-age RENEW. Still, as the message is retransmitted and not that time-critical, might be survivable for a while.
Worth mentioning that the breaking client behavior (synthesizing T2=1) was introduced in https://git.openwrt.org/?p=project/odhcp6c.git;a=commitdiff;h=473f248e2db6c6c39e7aecf78f888e44f36ff5c4 in early april, older builds of odhcp6c would not behave this way.
If I read this right, those older builds would instead retransmit REBIND 20 times, with normal exponential backoff, refreshing any returned addresses but considering the response subtly invalid, eventually restarting SOLICIT, which forcibly discards any PDs still held, until it gets through REQUEST/REPLY.
@koalatux (or anyone else) when you have time, could you check out https://github.com/AlsoBearPerson/odhcp6c/commit/250f56a73e7a1b5e1e90f53982e8915065947450 please?
This should fix the flood weâve been seeing, but I donât have a suitable development environment nearby, so I canât even syntax check that right now. Probably has a few loose bits to shake out, so it seems rude to pull request in its current shape.
I would like to tweak things a tad more - changing the static local T1/T2/T3 variables from relative times to absolute timestamps would remove the need for much of the ticking-down shenanigans, though Iâm not sure if I should keep piling that into this changeâŠ
Just rebooted with 3.10.5. Will check back in 12-24 hours to see if my problem persists.
Many thanks to everyone who has worked so intensively on the topic. We really appreciate it. The first tests with some customers look very good. ^dw
I enabled unicast in /etc/config/network (set âoption noserverunicastâ to â0â) and rebooted with 3.10.5, looks good so far indeed.
I have unicast disabled and it works for 48+ hours. Thank you for fixing this!
Note that if youâre on init7 (or another setup with busted unicast) you need noserverunicast at 1 (yay for double negatives).
As for the cause, I kind of suspect relay agent shenanigans - judging by addresses involved, my upstream router is also acting as dhcpv6 relay agent on multicast queries, which might allow it to e.g. add an interface-id option indicating which port the request came from, when relaying to the actual server. As unicast will likely be routed as normal packet traffic instead, it wonât be modified, and the server might miss information about which customer this is, unless it remembers such metadata by client_idâŠ
The RFC says âTherefore, a server should only send a Unicast option to a client when Relay Agents are not sending Relay Agent options.â - Iâm not seeing any relay agent options on the client side, but that doesnât mean there arenât any server-side. If my theory is correct, then either the server should not generate a unicast option to begin with, or the relay agent should cut it out of the response when relaying.
My uplink looks like a point-to-point link (not seeing any neighbor discovery traffic from my actual neighbors) so itâs not like multicast vs. unicast is going to make a huge difference in traffic fanout here.
Thx. I suspected as much, since about ~2h after the reboot I lost ipv6 connectivity again, set noserverunicast back to 1 and itâs stable again since about 2h (although it was working for that long with the last reboot as well, so weâll seeâŠ)
Lost IPv6 address again after 12 hours. Now retrying with option noserverunicast set to 0.