Bug #7141
puppetd runs fail in 'daemon' mode when stat'ing /proc files
| Status: | Closed | Start date: | 04/18/2011 | |
|---|---|---|---|---|
| Priority: | High | Due date: | 07/05/2011 | |
| Assignee: | - | % Done: | 0% |
|
| Category: | file | |||
| Target version: | 2.6.x | |||
| Affected Puppet version: | 2.6.5 | Branch: | ||
| Keywords: | ||||
| Votes: | 2 |
Description
I accidentally had a tree that Puppet was watching (auditing) with a few files that pointed to the /proc filesystem. Manual puppet runs worked perfectly, but background ‘daemon’ runs would hang. After a bit of stracing, I found that the hang started as soon as the puppet process tried to look at these /proc symlinked files. Again, manual puppet runs worked perfectly, but the daemon background runs are the ones that failed. Removing the symlinks solves the problem, but this is a bug of some kind.. I’m just not sure where.
OS: CentOS 5.5 Puppet Ver: 2.6.5
rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
lstat("/apps/kickstart/rhel55-x64-generic/test/etc/mtab", {st_mode=S_IFLNK|0777, st_size=12, ...}) = 0
readlink("/apps/kickstart/rhel55-x64-generic/test/etc/mtab", "/proc/mounts"..., 100) = 12
rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
stat("/usr/lib/ruby/site_ruby/1.8/digest/md5.rb", 0x7fffa4070a00) = -1 ENOENT (No such file or directory)
stat("/usr/lib/ruby/site_ruby/1.8/digest/md5.so", 0x7fffa4070a00) = -1 ENOENT (No such file or directory)
stat("/usr/lib64/ruby/site_ruby/1.8/digest/md5.rb", 0x7fffa4070a00) = -1 ENOENT (No such file or directory)
stat("/usr/lib64/ruby/site_ruby/1.8/digest/md5.so", 0x7fffa4070a00) = -1 ENOENT (No such file or directory)
stat("/usr/lib64/ruby/site_ruby/1.8/x86_64-linux/digest/md5.rb", 0x7fffa4070a00) = -1 ENOENT (No such file or directory)
stat("/usr/lib64/ruby/site_ruby/1.8/x86_64-linux/digest/md5.so", 0x7fffa4070a00) = -1 ENOENT (No such file or directory)
stat("/usr/lib/ruby/site_ruby/digest/md5.rb", 0x7fffa4070a00) = -1 ENOENT (No such file or directory)
stat("/usr/lib/ruby/site_ruby/digest/md5.so", 0x7fffa4070a00) = -1 ENOENT (No such file or directory)
stat("/usr/lib64/ruby/site_ruby/digest/md5.rb", 0x7fffa4070a00) = -1 ENOENT (No such file or directory)
stat("/usr/lib64/ruby/site_ruby/digest/md5.so", 0x7fffa4070a00) = -1 ENOENT (No such file or directory)
stat("/usr/lib64/site_ruby/1.8/digest/md5.rb", 0x7fffa4070a00) = -1 ENOENT (No such file or directory)
stat("/usr/lib64/site_ruby/1.8/digest/md5.so", 0x7fffa4070a00) = -1 ENOENT (No such file or directory)
stat("/usr/lib64/site_ruby/1.8/x86_64-linux/digest/md5.rb", 0x7fffa4070a00) = -1 ENOENT (No such file or directory)
stat("/usr/lib64/site_ruby/1.8/x86_64-linux/digest/md5.so", 0x7fffa4070a00) = -1 ENOENT (No such file or directory)
stat("/usr/lib64/site_ruby/digest/md5.rb", 0x7fffa4070a00) = -1 ENOENT (No such file or directory)
stat("/usr/lib64/site_ruby/digest/md5.so", 0x7fffa4070a00) = -1 ENOENT (No such file or directory)
stat("/usr/lib/ruby/1.8/digest/md5.rb", 0x7fffa4070a00) = -1 ENOENT (No such file or directory)
stat("/usr/lib/ruby/1.8/digest/md5.so", 0x7fffa4070a00) = -1 ENOENT (No such file or directory)
stat("/usr/lib64/ruby/1.8/digest/md5.rb", 0x7fffa4070a00) = -1 ENOENT (No such file or directory)
stat("/usr/lib64/ruby/1.8/digest/md5.so", 0x7fffa4070a00) = -1 ENOENT (No such file or directory)
stat("/usr/lib64/ruby/1.8/x86_64-linux/digest/md5.rb", 0x7fffa4070a00) = -1 ENOENT (No such file or directory)
stat("/usr/lib64/ruby/1.8/x86_64-linux/digest/md5.so", {st_mode=S_IFREG|0755, st_size=8776, ...}) = 0
open("/usr/lib64/ruby/1.8/x86_64-linux/digest/md5.so", O_RDONLY) = 8
close(8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
open("/apps/kickstart/rhel55-x64-generic/test/etc/mtab", O_RDONLY) = 8
rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
select(9, [5 8], [], [], {0, 796826}) = 0 (Timeout)
select(9, [5 8], [], [], {0, 0}) = 0 (Timeout)
rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
select(9, [8], [], [], {0, 0}) = 0 (Timeout)
rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
select(9, [5 8], [], [], {2, 0}) = 0 (Timeout)
select(9, [5 8], [], [], {0, 0}) = 0 (Timeout)
select(9, [8], [], [], {0, 0}) = 0 (Timeout)
rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
select(9, [5 8], [], [], {1, 999999}) = 0 (Timeout)
select(9, [5 8], [], [], {0, 0}) = 0 (Timeout)
select(9, [8], [], [], {0, 0}) = 0 (Timeout)
rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
select(9, [5 8], [], [], {1, 999998}) = 0 (Timeout)
select(9, [5 8], [], [], {0, 0}) = 0 (Timeout)
select(9, [8], [], [], {0, 0}) = 0 (Timeout)
...
Related issues
History
Updated by Ben Hughes about 1 year ago
Thank you for the report and the strace
This is quite probably due to the way Ruby reads files and the way Linux’s /proc is accessed.
http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/155744 https://projects.puppetlabs.com/issues/4466 has come across this before.
There’s not really a lot we can do with puppet as it’s the underlying Ruby IO#read that is at fault (or linux, depending on your side of the argument). That’s why we %x{ cat files} in Facter.
Updated by Ben Hughes about 1 year ago
- Status changed from Unreviewed to Investigating
Updated by Ben Hughes 11 months ago
- Due date set to 07/05/2011
- Assignee set to Ben Hughes
https://projects.puppetlabs.com/issues/4466 is the first mention of this bug and it seems Ruby as a whole has a problem with reading Linux’s /proc/
I shudder at the thought of having it check the file is “/proc/<something>” then %x( cat #{file}) but it may come to that if files in /proc/ requiring auditing.
Updated by Daniel Pittman 8 months ago
- Category set to file
- Status changed from Investigating to Needs More Information
- Target version set to 2.7.x
- Affected Puppet version set to 2.6.5
CentOS 5 is a 2.6.18 kernel all the way through; the IO#read bug was only before 2.6.13, so it shouldn’t be the same problem. That said, I can’t trivially reproduce this. Matt, can you confirm which kernel version you are running on your system? Is this an upgraded CentOS 4 machine still using the older kernel?
Updated by Daniel Pittman 8 months ago
Matt Wise wrote:
definitely running 2.6.18 with cents 5.3-5.6…
Well, thanks for confirming. The Ruby bug was definitely reported fixed in 2.6.13, so this shouldn’t be the same thing. It would be awesome if you could confirm that absolutely by running the C code from here: https://gist.github.com/441278
That will confirm if it is a Ruby bug, or not, reasonably well, I would hope. Thanks.
Updated by James Turnbull 7 months ago
Ben – can you ping Matt please and get him to test the code or find out if we can get access to a box to fix please.
Updated by James Turnbull 7 months ago
- Priority changed from Normal to High
Updated by James Turnbull 6 months ago
Red Hat has a ticket raised also now – https://bugzilla.redhat.com/show_bug.cgi?id=751214
Updated by Daniel Pittman 6 months ago
James Turnbull wrote:
Red Hat has a ticket raised also now – https://bugzilla.redhat.com/show_bug.cgi?id=751214
That looks like it has appropriate C code demonstrating the problem, and that it is outside our hands. I think we should close this ticket, but am not sure what status is best to represent “bug exists outside our control”.
Updated by James Turnbull 6 months ago
- Status changed from Needs More Information to Closed
The fix for this requires kernel changes. A ticket has been logged with Red Hat to fix the kernel regression upstream at https://bugzilla.redhat.com/show_bug.cgi?id=751214.
Updated by Chip Schweiss 5 months ago
- Status changed from Closed to Re-opened
- Target version changed from 2.7.x to 2.6.x
Redhat has release another kernel and the problem remains. Puppet needs to fix the problem of hanging indefinitely.
This problem could stay around for the remainder of Redhat/Centos 5. Not updating the kernel to keep puppet running is not a solution. Neither is shutting off ‘listen = true’.
Updated by James Turnbull 5 months ago
- Assignee deleted (
Ben Hughes)
Chip – have you raised that with Red Hat? I am still pretty reluctant to fix this – this is really in Red Hat’s court.
Updated by XiangJun Wu 5 months ago
Does it exist in Centos6.1?
Updated by James Turnbull 5 months ago
- Status changed from Re-opened to Closed
If CentOS 6.1 has one of the affected kernels then the bug will be present.
We’re still not planning to address this in Puppet and have requested customers raise the issue with Red Hat.
Updated by Dominic Cleal 4 months ago
In case anybody else finds this issue, Red Hat released the following errata for RHEL 5.7 containing the fix: RHSA-2012:0007-1
The minimum kernel version for the fix is kernel-2.6.18-274.15.1.el5. I think the bug has only been reported in the EL5 kernels, not EL6.