Nfs-common failing in LXC container (Debian template)

woosting · December 29, 2016, 5:09pm

Continuing the discussion from Build a home media server:

I seem to be getting similar errors during the installation of nfs-common (and nfs-kernel-server for that matter) inside a Debian Jessie LXC container:

Creating config file /etc/idmapd.conf with new version
Job for nfs-common.service failed. See 'systemctl status nfs-common.service' and 'journalctl -xn' for details.
invoke-rc.d: initscript nfs-common, action "start" failed.
dpkg: error processing package nfs-common (--configure):
 subprocess installed post-installation script returned error exit status 1
Processing triggers for libc-bin (2.19-18+deb8u6) ...
Processing triggers for systemd (215-17+deb8u5) ...
Errors were encountered while processing:
 nfs-common
E: Sub-process /usr/bin/dpkg returned an error code (1)

The command systemctl status nfs-common.service prints:

● nfs-common.service - LSB: NFS support files common to client and server
   Loaded: loaded (/etc/init.d/nfs-common)
   Active: failed (Result: exit-code) since Thu 2016-12-29 13:59:22 UTC; 2min 18s ago

Dec 29 13:59:22 testserv rpc.idmapd[2506]: main: fcntl(/run/rpc_pipefs/nfs): Invalid argument
Dec 29 13:59:22 testserv nfs-common[2495]: Starting NFS common utilities: statd idmapd failed!
Dec 29 13:59:22 testserv systemd[1]: nfs-common.service: control process exited, code=exited status=1
Dec 29 13:59:22 testserv systemd[1]: Failed to start LSB: NFS support files common to client and server.
Dec 29 13:59:22 testserv systemd[1]: Unit nfs-common.service entered failed state.

The command journalctl -xn prints:

-- Logs begin at Thu 2016-12-29 13:13:37 UTC, end at Thu 2016-12-29 13:59:23 UTC. --
Dec 29 13:59:06 testserv systemd[1]: Reloading.
Dec 29 13:59:21 testserv systemd[1]: Reloading.
Dec 29 13:59:22 testserv systemd[1]: Reloading.
Dec 29 13:59:22 testserv systemd[1]: Starting LSB: NFS support files common to client and server...
-- Subject: Unit nfs-common.service has begun with start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit nfs-common.service has begun starting up.
Dec 29 13:59:22 testserv rpc.idmapd[2506]: main: fcntl(/run/rpc_pipefs/nfs): Invalid argument
Dec 29 13:59:22 testserv nfs-common[2495]: Starting NFS common utilities: statd idmapd failed!
Dec 29 13:59:22 testserv systemd[1]: nfs-common.service: control process exited, code=exited status=1
Dec 29 13:59:22 testserv systemd[1]: Failed to start LSB: NFS support files common to client and server.
-- Subject: Unit nfs-common.service has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit nfs-common.service has failed.
--
-- The result is failed.
Dec 29 13:59:22 testserv systemd[1]: Unit nfs-common.service entered failed state.
Dec 29 13:59:23 testserv systemd[1]: Reloading.

woosting · December 29, 2016, 5:10pm

I tried with a container build from an Ubuntu template, and that one had no problem installing nfs-common. That leaves us with the question; why it does not work on Debian…

When I edit /etc/default/nfs-common to change: NEED_IDMAPD= into NEED_IDMAPD=no (after installation)
I can start nfs-common (and even apt-get upgrade nfs-common) flawlessly. But now I may face an id-mapping issue later…

In my efforts to not workaround, but find the root cause instead I’m currently investigating (from the logs posted above): rpc.idmapd[2506]: main: fcntl(/run/rpc_pipefs/nfs): Invalid argument, which seems to start the cascading issue. I did find a post about a kernel issue, but this would be strange as it does work in my Ubuntu container (running on the same stock Omnia kernel)…

In short: I am lost… Maybe @miska, who has helped me a lot in domain of LXC on Omnia in the passed, has a pointer for me here. Could you maybe try to replicate (simply create a container from the Jessie template, and install nfs-common)?

Maxmilian_Picmaus · December 29, 2016, 5:50pm

I just replicated your issue on my new ‘Jessie’ container. Same result, same log messages.
Following package removal ended ok.

white · December 29, 2016, 5:56pm

I don’t have Debian but I speculate there is a problem in the startup scripts that they don’t mount /run/rpc_pipefs/nfs for rpc.idmapd. Based on CentOS 7 it should have a mount like this:

# grep rpc_pipefs /proc/mounts
sunrpc /var/lib/nfs/rpc_pipefs rpc_pipefs rw,relatime 0 0

In CentOS 7 it is provided by systemd: /usr/lib/systemd/system/var-lib-nfs-rpc_pipefs.mount

white · December 29, 2016, 6:00pm

And see also these: https://bugs.launchpad.net/ubuntu/+source/nfs-utils/+bug/791588 and https://bugs.launchpad.net/ubuntu/+source/nfs-utils/+bug/993231

woosting · December 29, 2016, 6:18pm

On my end the command: cat /proc/mounts | grep rpc_pipefs prints:

rpc_pipefs /run/rpc_pipefs rpc_pipefs rw,relatime 0 0

So, if I understand correctly, it is correctly mounted, right @white?

I did see those bug-reports (but I only diagonally read them as the seemed fairly old); I’m going to investigate them in detail now.

In the mean time; if anyone has any additional thoughts; please share them with us (I’m getting desperate here).

white · December 29, 2016, 6:23pm

What the directory has inside? For example in CentOS 7:

ls -la /var/lib/nfs/rpc_pipefs
total 4
dr-xr-xr-x. 11 root root 0 Nov 18 00:49 .
drwxr-xr-x. 6 root root 4096 Dec 22 01:03 …
dr-xr-xr-x. 2 root root 0 Jul 23 14:21 cache
dr-xr-xr-x. 3 root root 0 Jul 23 14:21 gssd
dr-xr-xr-x. 2 root root 0 Jul 23 14:21 lockd
dr-xr-xr-x. 2 root root 0 Jul 23 14:21 mount
dr-xr-xr-x. 2 root root 0 Jul 23 14:21 nfs
dr-xr-xr-x. 2 root root 0 Jul 23 14:21 nfsd
dr-xr-xr-x. 3 root root 0 Dec 26 21:36 nfsd4_cb
dr-xr-xr-x. 2 root root 0 Jul 23 14:21 portmap
dr-xr-xr-x. 2 root root 0 Jul 23 14:21 statd

woosting · December 29, 2016, 6:30pm

Jah, same (for my location); the command: ls -la /run/rpc_pipefs prints:

total 0
dr-xr-xr-x 11 root root   0 Dec 29 18:06 .
drwxr-xr-x 17 root root 560 Dec 29 18:06 ..
dr-xr-xr-x  2 root root   0 Dec 29 18:06 cache
dr-xr-xr-x  3 root root   0 Dec 29 18:06 gssd
dr-xr-xr-x  2 root root   0 Dec 29 18:06 lockd
dr-xr-xr-x  2 root root   0 Dec 29 18:06 mount
dr-xr-xr-x  4 root root   0 Dec 29 18:06 nfs
dr-xr-xr-x  2 root root   0 Dec 29 18:06 nfsd
dr-xr-xr-x  2 root root   0 Dec 29 18:06 nfsd4_cb
dr-xr-xr-x  2 root root   0 Dec 29 18:06 portmap
dr-xr-xr-x  2 root root   0 Dec 29 18:06 statd

woosting · December 29, 2016, 6:30pm

The command: rpc.idmapd -fv (from the bug-report) prints:

rpc.idmapd: libnfsidmap: using (default) domain: lan
rpc.idmapd: libnfsidmap: Realms list: 'LAN'
rpc.idmapd: libnfsidmap: loaded plugin /lib/arm-linux-gnueabihf/libnfsidmap/nsswitch.so for method nsswitch

rpc.idmapd: Expiration time is 600 seconds.
rpc.idmapd: nfsdopenone: Opening /proc/net/rpc/nfs4.nametoid/channel failed: errno 2 (No such file or directory)
rpc.idmapd: main: fcntl(/run/rpc_pipefs/nfs): Invalid argument

This (taken from the output above):

Opening /proc/net/rpc/nfs4.nametoid/channel failed: errno 2 (No such file or directory)

Seems to be related as the (marked) part indeed does not exist: /proc/net/rpc/nfs4.nametoid/channel while:

/proc/net/rpc/nfsd.export/channel
/proc/net/rpc/nfsd.export/channel

Do exist though…

woosting · December 29, 2016, 10:40pm

I’m giving up (for today anyhow)…

Feels like a bug in the template (or something in that vein), as it does not occur in any other Debian instances I have running, nor does it occur when I use the Omnia’s supplied Ubuntu template. My best guess is that it has to do with the (initd) ‘upstart’ order in the container.

Very sad (for me) as the Omnia was supposed to act like a backup-server (through Debian LXC).

woosting · December 30, 2016, 1:14pm

Anyone know the person “responsible” for the Debian template (I think/hope it should/could be fixed there).

@Miska, @AdminX, @Tomnia, @Etz, @Bernstein, You guys have helped me with (LXC) related stuff before; care to shine your light over this?

Should I wait until the template is adapted so that nfs-common works out of the box, should I keep digging deeper for a solution (I fear that it would be in vain), or should I simply switch to an Ubuntu container (that atm. does not seem to include this issue)?

Edit: am setting up an Ubuntu version as we “speak”.

woosting · December 30, 2016, 3:46pm

Workarounds (for those interested…):

Use another container template (tested to work with ‘Ubuntu Yakkety’).
Simply use NSF3 (on server):

Turn off idmapd loading after installing ‘nfs-common’ (on the client):
- apt-get install nfs-common (will semi-fail)
- vim /etc/default/nfs-common
- change NEED_IDMAPD= into NEED_IDMAPD=no
- apt-get upgrade -y
Change UIDs correspondingly with server (if needed), as root (user can not be logged in):
- usermod -u <NEW_UID> <USERNAME>
- find / -uid <OLD_UID> -exec chown -h <NEW_UID> {} +

I’m shooting for the latter… as I am hoping that the Debian template-owner will pick this up (fix should be pretty straight forward I think). Albeit people are being el-noncommunicado about this one.

Edit: NFS3 seems to work for me now (but I’m still curious what is wrong here).

Rene_Malmgren · March 11, 2017, 4:54pm

I ran into the same problem. For some, for me yet unknown reason I have managed to gett passed it. I will write an update when I have the time

palo_m · September 23, 2017, 7:46pm

The problem still exists with newest Turris OS 3.8.1 and Debian Stretch LXC container.

More precisely, in Server-only mode (-S option) the idmapd can be started, while in Client-mode it crashes with infamous message:

rpc.idmapd: main: fcntl(/run/rpc_pipefs/nfs): Invalid argument

Without idmapd the NFS4 mounts do not work at all (they need idmapd running)

Search on web shows that the same happened to other people on various distributions when they did not have dnotify support in kernel. The container uses kernel from TurrisOS of course… and it seems that dnotify support is not turned on there (I just guess it, based on missing /proc/sys/fs/dir-notify-enable in both TurrisOS and container).

Is someone from Turris team going to check this?