FS#18073 - [initscripts] Add checks to detect if running within a cgroup

Attached to Project: Arch Linux
Opened by C Anthony Risinger (extofme) - Saturday, 30 January 2010, 01:34 GMT
Last edited by Tom Gundersen (tomegun) - Tuesday, 24 May 2011, 05:01 GMT
Task Type Feature Request
Category Arch Projects
Status Closed
Assigned To Tobias Powalowski (tpowa)
Aaron Griffin (phrakture)
Thomas Bächler (brain0)
Tom Gundersen (tomegun)
Architecture All
Severity Low
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 1
Private No

Details

i would like to have the initscripts modified to detect if they are being ran inside a cgroup. this would allow arch to run unmodified inside LXC containers. i think this would be fairly _simple_... you can check proc to see if the current process is inside a cgroup:

HOST:
cr@ph1 ~ $ cat /proc/self/cgroup
1:net_cls,freezer,devices,memory,cpuacct,cpu,ns,cpuset:/

CONTAINER:
VPS[guest-personal-tony] extof ~ # cat /proc/self/cgroup
1:net_cls,freezer,devices,memory,cpuacct,cpu,ns,cpuset:/guest-personal-tony

notice the extra information after "/". if your on the host you don't have a folder in the cgroup hierarchy. also if this file (/proc/self/cgroup) is missing, cgroups are not enabled, and initscripts can continue as normal.

right now i use heavily stripped down versions of all the init scripts in order to run arch in a container. note, i don't really modify the scripts, i just remove all the stuff that doesn't make sense in a container, i.e. hardware clock, udev, etc.
This task depends upon

Closed by  Tom Gundersen (tomegun)
Tuesday, 24 May 2011, 05:01 GMT
Reason for closing:  Won't implement
Additional comments about closing:  Submitter lost interest, use systemd.
Comment by Glenn Matthys (RedShift) - Monday, 15 November 2010, 12:14 GMT
What's the status of this issue?
Comment by Dave Reisner (falconindy) - Thursday, 03 March 2011, 01:25 GMT
What's the proposed behavior if its determined that the processes are running inside a cgroup container?
Comment by C Anthony Risinger (extofme) - Thursday, 03 March 2011, 03:24 GMT
mostly the initscripts need to just skip anything that doesn't make sense when inside a container, namely anything related to hardware (clock), udev, re-mounting / (likely ANY kernel-mounting at all), loopback, creating new interfaces, creating new nodes in /dev, etc ...

containers are pretty locked down, and they already have everything they need prepared for them. tbh, we shouldn't have to add anything new; this is pretty much 100% about identifying the things that must be _skipped_.

most live with a custom bash-based /sbin/init that only does a few maintenance tasks, but it would lower the barrier of entry if arch containers Just Worked.
Comment by Dave Reisner (falconindy) - Thursday, 03 March 2011, 14:43 GMT
The presence of a process running inside a cgroup is by no means an indication that the OS is running inside a container.

$ cat /proc/self/cgroup
10:blkio:/
9:net_cls:/
8:freezer:/
7:devices:/
6:memory:/
5:cpuacct:/
4:cpu:/
3:ns:/
2:cpuset:/
1:name=systemd:/user/dreisner/1
Comment by C Anthony Risinger (extofme) - Thursday, 03 March 2011, 16:45 GMT
"notice the extra information after "/". if your on the host you don't have a folder in the cgroup hierarchy." ... normally.

:-) the `name` group doesn't matter; none of the other subsystems that do matter are in a cgroup. but that is a good point ... i forgot systemd did this (and i have considered placing the entire system in a real cgroup via some mkinitcpio magic)

soooo ... a foolproof approach might be to add an parameter to the kernel cmdline, say `lxc.cgroup.host=/` or similar, to indicate to initscripts which cgroup is the host. this way, initscripts can simply compare:

[[ /proc/self/cgroup == ${lxc_cgroup_host} ]] && is_host=true || is_host= false

something like that, except actually working.
Comment by Tom Gundersen (tomegun) - Wednesday, 13 April 2011, 09:17 GMT
Does this work in systemd? If so, how do they detect the container? It would be best not to reinvent something.
Comment by Tom Gundersen (tomegun) - Tuesday, 26 April 2011, 15:18 GMT
I would be opposed to lots of conditionals in rc.sysinit et al, but if I understand correctly we would just need one check at the top of rc.sysinit, and return immediately if we are in a container, so that would be ok.

Someone would have to provide a foolproof check whcih does not introduce any custom configuration though (preferably with a reference to how this is done elsewhere)... Any suggestions?
Comment by C Anthony Risinger (extofme) - Wednesday, 18 May 2011, 15:26 GMT
oops, i didn't realize there was activity on this -- sorry for delay.

i don't know if this works on systemd (heh idk if systemd was around when i opened this) but that would be a great test. i actually been meaning to test this anyway because i pretty much use systemd for everything now -- i'll test this out in the coming days/weeks and update.

as for conditionals, i don't *think* you'd need many, but it's been awhile since i've been in the initscripts. there are still some operations that are fine -- and desirable -- while inside a container (clear /tmp? some others ...) but pretty much anything related to udev or hardware is a no go, for now at least.

in the end maybe it's not worth modifying the stock initscripts; perhaps an alternate could be provided? sort of like initscripts-systemd, eg. initscripts-lxc
Comment by Tom Gundersen (tomegun) - Wednesday, 18 May 2011, 15:34 GMT
@extofme: let me know when you find out. If you are interested in pushing this into initscripts, I'd be open to patches, but if you won't be using it yourself (I guess you are using systemd?) and no one else steps up, then I think I'll close it as a "won't implement".

As to clearing up /tmp. I think it would be a reasonable assumption that people running lxe could put /tmp on tmpfs and symlink /var/{run,lock} to /run/{,lock}. In which case there is no cleaning up to be done :-)
Comment by C Anthony Risinger (extofme) - Wednesday, 18 May 2011, 15:46 GMT
fair enough :-)

yeah i can say that i probably would not be using it myself, as i am indeed using systemd. i opened this when i was using a raw LXC setup heavily but things have changed a bit since then ... i am curious how systemd handles it though because if it cannot i'd still be interested in adding an initscripts-lxc or initscripts-systemd-lxc ... ultimately i just want _something_ that enables arch to run *unmodified* inside a system container. LXC-type technology is set to displace OpenVZ, and the init problem is a common theme in both arenas.
Comment by C Anthony Risinger (extofme) - Tuesday, 24 May 2011, 04:38 GMT
well you can go ahead and close this if you like -- systemd works just fine in a container, copes very well with mounts already existing/udev etc not working/etc, and even nicely exit()s at the end if it detects itself inside a namespace/container (vs _refusing_ to die or calling reboot() like sysvinit -- which means the cgroup handler won't be triggered) ... so, sysvinit, this is officially the last nail in the coffin my friend :-)

per discussions:

http://lists.freedesktop.org/archives/systemd-devel/2011-April/002066.html
http://0pointer.de/blog/projects/changing-roots

from the latter:

"systemd itself has been modified to work very well in such a container. For example, when shutting down and detecting that it is run in a container, it just calls exit(), instead of reboot() as last step."

so once again systemd proves itself as being 110% awesome soaked ... per the former thread, `systemd-nspawn` is damn near a bare bones replacement for LXC tools; a couple more verbs in the conf files related to namespaces + networking? (and eventually stuff like FD NS in .40+) and systemd could be handling application and OS containers with ease.

good stuff ... on to writing systemd units and templates for starting/stopping/handling LXC containers -- hooray!
Comment by Tom Gundersen (tomegun) - Tuesday, 24 May 2011, 05:01 GMT
Cool, cool.

Loading...