Bug #3365
100% CPU usage
| Status: | Needs More Information | Start date: | 03/11/2010 | |
|---|---|---|---|---|
| Priority: | Normal | Due date: | ||
| Assignee: | - | % Done: | 0% | |
| Category: | - | |||
| Target version: | - | |||
| Affected Puppet version: | 0.25.4 | Branch: | ||
| Keywords: | ||||
Description
Hi,
I’ve been experimenting with Puppet for a few days now, and overall I’m pretty impressed on how easy Puppet makes it to manage configurations. However, one thing has been ruïning my enthusiasm thoroughly, and that is the massive CPU consumption of Puppet.
At first I used puppet to source in and manage a few hundred megabytes of data, so I presumed Puppet just wasn’t made to provide such large amounts of data. So I set up my own apt repository and created some custom packages to as an alternative way to transfer data.
I also learned about the checksum file property, and that the default value of md5 can cause a lot of CPU consumption. So I turned checksumming of (checksum => undef) .
But now puppet is still happily eating away 100% CPU for tens of minutes at a time, with no apparent things happening. (puppetd -tv —trace —debug, but nothing appearing in the console while Puppet is cooking the CPU.)
I believe the following resource is to blame: file { “/some/data/dir”:
owner => "$username",
group => "$username",
recurse => "true",
ensure => "directory",
checksum => undef
}
I just want this resource to make sure that all files in the directory are owned by user and group $username. /some/data/dir contains 300M in 6000+ files. This resource executes swiftly, but after the last file has been chown’d, the puppet hogs the CPU with 100% usage, lasting for looong. (Looong being: 30+ minutes, and me hitting CTRL-C being impatient and frustrated with seeing nothing happen.)
Some top output: 9570 root 25 0 228m 151m 3664 R 99 29.7 14:31.27 puppetd
I don’t really understand why I’m getting this. Is Puppet unable to handle this request? What is happening?
I’m a bit disappointed to run into such an issue while just doing some trivial tests… If I can’t solve this I can’t see how Puppet can be usable for me (and there aren’t that many alternatives..). I don’t know Ruby, and I’m not really fan of the debug-before-use approach…
Some information about my setup: puppetd en puppetmasterd are 0.25.4
Both running on Xen Dom-U instances uname -a: Linux hostname 2.6.18.8 #2 SMP Wed May 27 15:54:07 CEST 2009 x86_64 GNU/Linux Ubuntu intrepid 8.10
dpkg —list | grep ruby: ii ruby 4.2 An interpreter of object-oriented scripting ii ruby1.8 1.8.7.72-1 Interpreter of object-oriented scripting lan
Not really any logging to show, since nothing is logged…
I’m aware this isn’t much to go on, but I’ll try to provide you with anything you may need if you just ask for it.
History
#1
Updated by Dieter Van de Walle over 3 years ago
Well it finished after all: notice: Finished catalog run in 2734.31 seconds :)
#2
Updated by Dieter Van de Walle over 3 years ago
Some more information from the —summarize option I just discovered (since it is undocumented):
Changes:
Total: 4271
Resources:
Applied: 4271
Out of sync: 2138
Scheduled: 4435
Total: 115
Time: Config retrieval: 1.36
Exec: 0.77
File: 19.23
Filebucket: 0.00
Host: 0.00
Package: 31.99
Schedule: 0.00
Service: 1.42
User: 0.01
Total: 54.78
warning: Value of ‘preferred_serialization_format’ (pson) is invalid for report, using default (marshal) notice: Finished catalog run in 1877.06 seconds
It seems to me the cause of the delays is not recorded in the time overview?
#3
Updated by Peter Meier over 3 years ago
Dieter Van de Walle wrote:
Some more information from the —summarize option I just discovered (since it is undocumented):
summarize is documented with all the other configuratin options: http://docs.reductivelabs.com/references/latest/configuration.html
However if you have any other points where it is missing, please raise a ticket to improve documentation.
#4
Updated by Dieter Van de Walle over 3 years ago
My apologies, seems like I missed that one.
Some more information: whatever Puppetd is doing during the 100% CPU usage, it seems to be unnecessary.
If I execute puppetd -tv and wait until all files have been chown’d, and then hit CTRL-C and run puppetd again, the second time, puppetd finishes in about a minute. The total process of running puppetd twice takes about 4 minutes .
So by manipulating the execution of puppetd using a CTRL-C keystroke I can reach the intended end state in 4 minutes. If I leave puppetd to do it by itself, it takes 30+ minutes of 100% CPU usage…
Also please notice that puppetd stalling at 100% CPU usage happens AFTER all files have been chown’d to the correct state.
#5
Updated by Peter Meier over 3 years ago
some more information and discussion is found in this thread: http://groups.google.com/group/puppet-users/browse_thread/thread/84ab151c4935524f
#6
Updated by James Turnbull over 3 years ago
- Status changed from Unreviewed to Investigating
#7
Updated by Brice Figureau over 3 years ago
Peter Meier wrote:
some more information and discussion is found in this thread: http://groups.google.com/group/puppet-users/browse_thread/thread/84ab151c4935524f
Some more information in this message too: http://groups.google.com/group/puppet-dev/msg/f3cced03d3747201
Looks like we generate tons of events (at least for the first run) and puppet spends a large time on propagating those events (or at least flowing those events is sub-optimal).
#8
Updated by Yannick Menager over 1 year ago
Agh !
This bug is really bad
Just had my puppet completely freeze and lock for ages (I gave up after half an hour) because it was trying to do a recursive file owner change on a directory with many files.
Doing such a thing is really not that rare in system administration world, I am astonished that bug has been around for 2 years and nothing has been made about it
#9
Updated by James Turnbull over 1 year ago
- Status changed from Investigating to Needs Decision
- Assignee set to Nigel Kersten
Yannick – what version are you running? What platform? Can you provide an strace or log output?
Nigel – we should either kill this bug report or refactor it. Up to you.
#10
Updated by Nigel Kersten over 1 year ago
- Status changed from Needs Decision to Needs More Information
- Assignee deleted (
Nigel Kersten)
This needs more info. My understanding is that we fixed the primary issue in 2.7.x, but I need more info from Yannick to find out what he’s running.
#11
Updated by Juan Pablo Daniel Borgna about 1 month ago
I have some similar behavior, the problem is when my target directory already contains files, it seems that the whole tree is evaluated. In my case, a catalog run took 1200 seconds, I was using target /usr/src to store 10 megs of files, but in that directory I already had the kernel sources and headers. Just by changing the target to an empty directory the time went down to 32 seconds.
HTH
Saludos, Juan Pablo.