PPPoE doesn't try to reconnect when failed

Well, this is probably OpenWRT netifd related but I was unable to find propper solution anywhere. When my DSL router (configured as PPP bridge) is not up while omnia is starting or when I reboot DSL router while Omnia is running, it will output this messages to log:

2016-12-03T09:32:41+01:00 warning pppd[1750]: Timeout waiting for PADS packets
2016-12-03T09:32:41+01:00 err pppd[1750]: Unable to complete PPPoE Discovery
2016-12-03T09:32:41+01:00 info pppd[1750]: Exit.
2016-12-03T09:32:41+01:00 notice netifd[]: Interface 'wan' is now down
2016-12-03T10:32:41+01:00 info kernel[]: [   59.993808] IPv6: ADDRCONF(NETDEV_UP): eth2: link is not ready
2016-12-03T09:32:41+01:00 notice netifd[]: Interface 'wan' is disabled
2016-12-03T09:32:41+01:00 notice netifd[]: Network device 'eth2' link is down
2016-12-03T09:32:41+01:00 notice netifd[]: Interface 'wan' has link connectivity loss
2016-12-03T09:32:43+01:00 notice netifd[]: Network device 'eth2' link is up
2016-12-03T09:32:43+01:00 notice netifd[]: Interface 'wan' has link connectivity 
2016-12-03T10:32:43+01:00 info kernel[]: [   61.990703] mvneta f1070000.ethernet eth2: Link is Up - 1Gbps/Full - flow control off
2016-12-03T10:32:43+01:00 info kernel[]: [   61.990724] IPv6: ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready

and it will never try to run pppd again. I have to manualy run ifdown wna && ifup wan or /etc/init.d/network restart and then it establish connection again. Anyone with similar issue and/or solution?

haven’t tested it but try to use keepalive option as per:

https://wiki.openwrt.org/doc/uci/network#protocol_pppoe_ppp_over_ethernet

option ‘keepalive’ ‘10’

I’ve tried to set keepalive and it doesn’t helped. Now I’m looking into netifd code and ppp.sh script. If I understand it correctly, netifd is supposed to restart pppd when it’s terminated (which works when I kill pppd manualy), but it somehow fails to do so if I reboot DSL modem which leads to link down on eth1 and simultaneously to pppd termination because of timeout.

I’m pretty pissed of by all of this OpenWRT specific daemons. They are great and easy to configure until everything works. But once something goes wrong, it’s pretty difficult to know what is going on.

Ok, I’ve finaly found a working “solution”. Take recent ppp.sh from LEDE and replace /lib/netifd/proto/ppp.sh in your omnia with it. Then add following options to your wan interface in /etc/config/network:
option persist 'true’
option maxfail '0’
option holdoff ‘10’

This will force ppp daemon to never exit when connection cannot be established or fails and it will try to reconnect every 10 seconds.

In theory netifd is supposed to restart pppd when it exits with error, but sometimes it fail to do that and even ifdown wan && ifup wan doesn’t help (only /etc/init.d/network restart helps). Probably some nasty race condition which I was unable to find out.

Could this be merged in the next Turris release, please? It’s a pity such basic things require random hacks.

Yes, agree! I would also like to see this in the next release!