The Puppet Labs Issue Tracker has Moved: https://tickets.puppetlabs.com

This issue tracker is now in read-only archive mode and automatic ticket export has been disabled. Redmine users will need to create a new JIRA account to file tickets using https://tickets.puppetlabs.com. See the following page for information on filing tickets with JIRA:

Bug #12310

Significant slow down in 2.7.10 apply

Added by R.I. Pienaar over 4 years ago. Updated about 4 years ago.

Status:ClosedStart date:01/30/2012
Priority:NormalDue date:
Assignee:Patrick Carlisle% Done:

0%

Category:-
Target version:2.7.11
Affected Puppet version:2.7.10 Branch:https://github.com/pcarlisle/puppet/tree/ticket/2.7.x/12310-puppet-apply-fact-loading
Keywords:

We've Moved!

Ticket tracking is now hosted in JIRA: https://tickets.puppetlabs.com


Description

I’ve been exploring approaches for running puppet masterless and in the same time trying to massage my manifests to be compatible with 2.7.x scoping etc.

I noticed a huge increase in run times between 2.6.9 and 2.7.10 with the same manifests, using envpuppet and a git clone I’ve gathered this information:

======== 2.6.9  
notice: Finished catalog run in 9.95 seconds
envpuppet puppet apply --pluginsync    14.45s user 4.99s system 92% cpu 21.098 total
======== 2.7.9  
notice: Finished catalog run in 16.52 seconds
envpuppet puppet apply --pluginsync    21.92s user 6.90s system 93% cpu 30.814 total
======== 2.7.10  
notice: Finished catalog run in 21.34 seconds
envpuppet puppet apply --pluginsync    23.58s user 9.42s system 73% cpu 44.662 total

2.7.0 to 2.7.9 performs the same. While 2.7.0-9 is already a fair bit slower than 2.6 was 2.7.10 adds another 14 seconds to the run using the same manifests on the same machine.

Comparing last_run_summary.yaml files I notice that the big change in run time info is service:

    2.6.9: service: 2.920294
    2.7.9: service: 8.238562
    2.7.10: service: 12.400952

But this does not on it’s own account for the time increase there seems to be a hit somewhere like state.yaml handling or report handling or something that isn’t reflected here.

Unsure how to gather more useful information to narrow this down


Related issues

Related to Puppet - Bug #12181: Puppet 2.7.10 run stopps after loading facts. Closed 01/26/2012
Related to Puppet - Bug #12464: puppet apply looks up facts twice Closed 02/06/2012
Related to Puppet - Bug #8341: Loading facts repeatedly Closed 07/09/2011
Duplicated by Puppet - Bug #12588: Running Puppet in one-time mode from cron leaves hung pup... Duplicate 02/12/2012

History

#1 Updated by Anonymous over 4 years ago

R.I. Pienaar wrote:

I’ve been exploring approaches for running puppet masterless and in the same time trying to massage my manifests to be compatible with 2.7.x scoping etc.

I noticed a huge increase in run times between 2.6.9 and 2.7.10 with the same manifests, using envpuppet and a git clone I’ve gathered this information:

[…]

2.7.0 to 2.7.9 performs the same. While 2.7.0-9 is already a fair bit slower than 2.6 was 2.7.10 adds another 14 seconds to the run using the same manifests on the same machine.

Comparing last_run_summary.yaml files I notice that the big change in run time info is service:

[…]

But this does not on it’s own account for the time increase there seems to be a hit somewhere like state.yaml handling or report handling or something that isn’t reflected here.

Unsure how to gather more useful information to narrow this down

The least worst profiler that I know if is the google perftools for Ruby: https://github.com/tmm1/perftools.rb

Otherwise, having access to the manifests – or, ideally, the smallest subset that demonstrates the problem – is useful.

Finally, if you could write a script that run puppet apply or whatever, measured the time, and could report “good” or “bad”, a git bisect would be wonderful. Er, or manually, of course.

#2 Updated by R.I. Pienaar over 4 years ago

Ran my manifests with various points in the timelines between 2.7.9 and .10:

                 2.7.10                                    42.580
after symb modes 2ac94f94b9af6d8025ee0e771c4c7141ca9705c6  42.236
b4 symb modes    48726b662d2fb5cc3532fca12eb6aa5a3c1622cd  42.197
after facts once c58bfbc64f9f22009cd57c5cc8a924ae21fe3a97  42.448 / 42.496 / 42.299
b4 facts once    134e826084aa0eec0d065d070288e984a077d21b  36.434 / 36.435 / 37.633
                 2ca9f9da9baa11e21c58c66820e8ba59e955e7e1  37.129
                 ca7328323ab9de76191601aef373d3d8486c023f  37.873
                 86a806f595f8b7bb280c8c445eef51dfd71bf39d  37.129
after probes     0e4acaf486377cb7741e8cb0cdf96227c449a800  36.192 
before probes    2fd94d2e912a1b0dcc08ff510658948644a7fc1c  33.438
                 2.7.9                                     30.997

There’s some slow down early on in the series that I will hone into

but it seems I identified the work in #8341 as a posible cause as across the 3 runs with the before and after hash I can reliably see a significant slow down there, will do some more tests ‘morow with the early commits to see whats up there

#3 Updated by Anonymous over 4 years ago

  • Status changed from Unreviewed to Investigating

Thanks for the work. Patrick, can you also look at this, and work with R.I. to figure out what is going on? That is enough of a slow-down that it would be nice to understand it and, ideally, eliminate it.

#4 Updated by Anonymous over 4 years ago

I forgot to ask: are you doing anything with pluginsync here? Do you have a remote file server?

#5 Updated by R.I. Pienaar over 4 years ago

Pluginsync or no pluginsync does not make a performance difference.

Infact is seems pluginsync is just always on. Without enabling pluginsync it still picks up the facts in the modules anyway and I dont see a diff in speed, additionally in apply mode pluginsync doesnt actually copy the plugins to $libdir, and this is a significant shortcoming.

Something to note between 2.7.9 and .10 is:

2.7.9:

# envpuppet work/puppet/bin/puppet apply --modulepath=/etc/puppet/manifests/common/modules /etc/puppet/manifests/site.pp --verbose 
info: Loading facts in puppet_enabled
info: Loading facts in etc_facts_loader
info: Loading facts in devco_kvm
info: Applying configuration version '1328009162'

2.7.10:

# envpuppet work/puppet/bin/puppet apply --modulepath=/etc/puppet/manifests/common/modules /etc/puppet/manifests/site.pp --verbose 
info: Loading facts in /etc/puppet/manifests/common/modules/puppet/lib/facter/puppet_enabled.rb
info: Loading facts in /etc/puppet/manifests/common/modules/puppet/lib/facter/etc_facts_loader.rb
info: Loading facts in /etc/puppet/manifests/common/modules/puppet/lib/facter/devco_kvm.rb
info: Loading facts in /etc/puppet/manifests/common/modules/puppet/lib/facter/puppet_enabled.rb
info: Loading facts in /etc/puppet/manifests/common/modules/puppet/lib/facter/etc_facts_loader.rb
info: Loading facts in /etc/puppet/manifests/common/modules/puppet/lib/facter/devco_kvm.rb
info: Applying configuration version '1328009297'

Which lines up with what I said on IRC last night – it’s back to loading facts twice a run.

I am running every merge commit between 2.7.9 and 2.7.10 now and gathering timings, will update once I have that

#6 Updated by R.I. Pienaar over 4 years ago

Ran every merge commit 3 times over and gathered the times, this is what I got:

b16bf84f3be962a68d98730df0ffc182cbf170f3 38.852555 38.276405 39.392357
d57c5730a94838a82792decf325378c237ba24f3 38.489429 41.893006 38.951914
3a57b0736690bc6a83b63b292a46f4f1e69c7618 38.484198 38.667243 38.812559
71062fb5b26e42cb91d1a79f7384023d45497974 39.523727 38.621266 40.209976
b25c97413ff991c7856ab3a1a448882d20463818 38.591770 38.924291 38.460350
c8f5cd6d1a43f07cabeba29940bb487c1c273953 39.356980 38.818056 38.387146
a0f865b69ee4345dada11ce2c9da6b21dabdcb7d 40.042191 38.077763 38.392729
cb823279bd7c6e277327e82e687bf123cf79c485 39.085861 38.500411 38.748656
96e79835e3724edf582b1081b01df2f5a6798303 38.610890 39.728374 39.356731
75d7cad19be95760bc08e2ce4a67208aec5c6af7 40.798864 38.817174 38.943789
8b3e440d5e9fcd23ace0789f33327d1c6a68b333 39.723660 39.234682 39.162569
89013a4517fe0441aeeddd684489fd3467531d80 42.757174 43.053200 43.351334
9348fe206289a871b08a831220279d51ddbf9964 41.855567 43.528620 43.197849
b79be3bc80a4e6a0d110df70e454066b1aeaf9df 43.248857 42.995667 42.197756
3aed4960c336a2662bcfe37a25d480cddd268c69 42.487904 43.989012 43.030842
f4f4158f673b911fb95d5dbb1a6f0eff8387bdbc 42.055357 44.709555 42.602036
2ac94f94b9af6d8025ee0e771c4c7141ca9705c6 41.991469 43.373342 41.853326
1eddaeb206f9463dbf11ade4b6c61e1aca9e2472 45.571136 45.532001 44.323364
e081c81001a44159ee7d93c0288dcc584e26be4f 44.804433 44.010178 46.732684
4033a587d5ccee2b5bdd0eec86e836bea579f1f0 43.740124 43.756394 43.315551
134e826084aa0eec0d065d070288e984a077d21b 40.060404 39.483845 39.200537
ab0242505c632b5c975cfacdbafe1b5031d3d77d 40.581767 39.538368 39.381114
2a99004fb94529b16b209498a72be2b57559ed03 18.824988 21.799284 22.163374
276d590b9b98953c2c5eb98a9df3d687f0f1e16b 39.867744 42.495896 41.533383
483a1d93a5e5c78f8f41f905d7aae896081c98f1 39.378438 33.855791 39.344782
c6667c50d3b49195685311575b46de1978c4dfd9 39.785773 39.599681 39.205374
bc6c6428c7aeb41676463e3ad9b0f1344790b4bf 34.415683 39.301617 35.627080
5ec5657a1e825bf26886ec9078966c79ae288581 39.572634 39.651599 33.946596
450da6afb2468810ac2de64cc97fb9ed67963a91 39.474409 39.566346 33.739620
1cb2b6a5a361ce71fb3a979597878f69185c8a2c 41.339828 33.572613 33.999695
fe795377f5472b6bd162719ac8a9950e795d6b7c 39.360120 39.348533 39.393848
f755dcb5e3ca62210328fca07729b52e33a4f1f5 33.765480 35.200514 39.891891
9751f900507bae936f16f0184fa13d2016aea7a3 39.573316 39.345956 33.880733
a7a10f6c23512e4364ad53bb6f4c41ec7214dad5 39.710078 39.461398 39.299581
5b90ade7a83569c80cd96ca76cd937f5d61d8704 18.792595 19.393444 20.764387
9b62a87f4bbd6af6ca3ec400bd1eec2bf3270ccf 39.549501 33.770279 39.687674
9ba85db04b1712e620a4acb410d23d52c109d4de 39.624505 34.401210 39.177884
63e31232c192d6d5ad86f2ea829d4605e069ec4e 35.532471 39.286486 39.364924
9fe75c18118d68233d3ce9f189429266e3823725 19.385550 18.746884 19.946212
07da208a3e580337ad0bc94019b90e0fb8270a0e 33.389141 33.507405 33.413872
c3aa97a44d61f92413126b5438395dac00e56af0 33.576208 34.818024 34.230643
f79bd374ce9c1132b60d06e9eddd1002c595800d 33.870875 40.245595 33.993165
ac8b408e18390ebf1bc31a3bab7df9d269b2629e 33.549252 33.220439 33.207003
86a806f595f8b7bb280c8c445eef51dfd71bf39d 33.570269 33.256519 35.024061
ac81771a9cc70b5fb58f86439ebb0585ce92d4e8 33.363360 33.339858 33.414208
399e923f790baec15dc3cd5a99713ee01287949c 33.371006 34.441135 33.217057
5e7da648b770b925997118bba0f2a652620bc904 34.923389 33.500287 33.161873
0e4acaf486377cb7741e8cb0cdf96227c449a800 33.483709 33.918367 33.231711
d4c80943c017c3d0ea5118f35b2ca32c9c26e800 0.395047 0.396598 0.398188
f842d0bd884182bbd75e6d3e061193da3f3f1d2d 30.235971 30.394854 32.090406
2fd94d2e912a1b0dcc08ff510658948644a7fc1c 30.029695 31.811432 30.267410
45f0855bd6be36a66acd0a28a8ed83e2d63af239 29.953422 29.956296 30.183331
788d9368cb33a14a9b27a61da2af57dbe38b1ac2 30.070656 29.925391 30.075092
e40439196af14f90d8a321999b782f24a3fa59f6 30.539101 29.979454 30.521616
4cc1142da35ecc3b052c67e550bba5d39c2b78f4 32.148519 29.965016 31.277051
a08370db73c2a9f8de4531d22565f94a3910ecfd 29.913568 30.043823 30.154282
933844567fbe4078cb76f2c089ced1cb6c4170b7 29.804537 32.472687 29.944656
2717c55076ea42cd38ea8b549ee7add31d6bb722 31.835056 30.054538 30.164638
7da15bfbcfd654a2e5104e838446ce1fabf05e42 30.266463 30.104694 30.025478
04b027b8d4a4e13b4b9f2ee7cd4c01715ffd68e8 32.124273 30.047929 30.160006
40313fa143ed97d9b4f83f150ba2d5c8811d40e3 29.947179 32.003117 30.245262
2eff2fab91aa060aa53bbd202d88fb6ffd1871eb 29.988889 30.014592 30.128594
8ae917aaeabc47fe950e0d580adde2d28f373de7 31.960892 30.089084 30.039127
b9dcf4b33aeb6b0b8bb3948138d657039c88e382 30.314822 29.983262 30.726098
f4bf4f46f0215ac54528687e04a32d6f64de2b3f 30.220985 30.121122 30.753789
e465598ab1b7dc8481e7fa461d0c5d38fc2cdfd5 30.162200 30.272077 30.147691
56fbe018bb7e317c8b5b19f69e95f4ee6acde2b9 30.983789 30.441670 30.477856
bfcb5084d85238ec120a921b4ecd8a90a59bc94a 33.022167 30.328809 30.033843
0be9f5c9e9afea23fddbcd556609bdc38d52ce1c 30.324325 30.485333 30.608825
4af4e79c3b2b86566ada589112f53515a47676f7 31.926673 30.710963 30.419682
2ae312f09bbb64a439b24afcf0b2e952026ce038 30.042341 31.219559 30.145261
66c3662baec3b0ca69a58c4cd11484d5ed209dad 30.046865 30.088771 30.467166
eae2c9df6f956f36ee3419997c12710e20b440ff 30.444196 30.335682 30.370801
7d38216238203d2cf7273af228a0b86b872c69da 30.872877 30.346635 32.080547
5da4875a50cf29079ea8c3c8bcb3ce50ad7f02f2 30.422187 30.373051 30.448484
64a3e145b3dfb6992e159a9729dfd2d05f9fdcdf 30.103224 30.259175 30.166145
5585afc591c10058dd0baa8cbb2f4a7564c945d9 31.706147 30.308371 30.104412
0f56154d3cdcb49106dd1a17687dd69492f02be4 30.423631 30.445816 30.146614
999355facb493509e3648845286d6fefe6f598fd 30.201958 30.134682 30.110047
cb753cdc562b078c2339ca00232038a66581c167 30.243477 30.677706 30.854727
f7f4ac206bfc231700ded1c4d528da755b32225b 30.068403 32.431411 30.163976
379fdafdcc1cc07582a9a9ebca1200a417aab4c8 32.416338 30.609840 30.722802
5e46f6ae171ad975b36c638c69cd72be5d8fafa4 31.120853 30.468941 30.360997
7328d8079a4e76988c26e294149bf680f0de7ebd 30.508015 30.432155 32.334312
593b23e01cffdcb9f6775ed500ef36ba1d2f991f 30.737827 30.614706 30.464248
d5a9db768dc4e2fc4b082c8954520833910dd740 30.617103 31.273978 30.621042
02cdbaffa55fe016a257ba61f2778037d19b48cf 32.317035 30.639787 30.548623
4ca51f8a4ff0969251d5f224e52d3d126e12fa95 30.637817 30.880277 30.632323
5e9cf138aaf8a1686889495aceab02c1aaf280b9 30.859767 30.645013 30.765645

Some immediate things to note:

  • Same machine, same manifests, it’s all the machine did at the time (this is a VM but everything else was idle)
  • Command line was “envpuppet work/puppet/bin/puppet apply —modulepath=/etc/puppet/manifests/common/modules /etc/puppet/manifests/site.pp —pluginsync —verbose”
  • d4c80943c017c3d0ea5118f35b2ca32c9c26e800 was a broken commit puppet didnt work at all

Really weird is that 9fe75c18118d68233d3ce9f189429266e3823725, 2a99004fb94529b16b209498a72be2b57559ed03 and 5b90ade7a83569c80cd96ca76cd937f5d61d8704 is consistently much faster than the commit before and after them, I can’t explain that. During those runs Puppet did not log any errors, failed resources or anything like that. Really weird.

We should automate this, it would make a great addition to our CI life cycle and a valuable artefact for metrics.

#7 Updated by Patrick Carlisle over 4 years ago

  • Status changed from Investigating to In Topic Branch Pending Review
  • Assignee set to Patrick Carlisle
  • Target version set to 2.7.x
  • Affected Puppet version set to 2.7.10
  • Branch set to https://github.com/pcarlisle/puppet/tree/ticket/2.7.x/12310-puppet-apply-fact-loading

Does https://github.com/pcarlisle/puppet/tree/ticket/2.7.x/12310-puppet-apply-fact-loading fix this for you?

Unfortunately we don’t have a well defined lifecycle for facts, and in addressing #8341 we decided that the existing fact caching in the indirector was too aggressive. This works pretty well for the agent but looks like it causes a problem in apply. This patch tries to fix it for the most common case, and in the long term we’d like to solve the problem via autoloading in facter and possibly puppet.

#8 Updated by R.I. Pienaar over 4 years ago

That cuts the expected 2 seconds off the run, leaving 10 seconds slow down overall so this is def an improvement

#9 Updated by Patrick Carlisle over 4 years ago

Does that mean we’re down to just the difference between 2.6.x and 2.7.x?

#10 Updated by R.I. Pienaar over 4 years ago

No, sorry for not being clear, we’re still 10 seconds slower than 2.7.9

#11 Updated by Patrick Carlisle over 4 years ago

Can you share a manifest that causes this? I don’t have enough set up here to really show a significant difference.

#12 Updated by R.I. Pienaar over 4 years ago

Unfortunately not, this is like 25k LOC manifests not all applied to all machines, so I do not have a specific set of manifests I can demonstrate with that I would be willing to share

the series of runs I posted for each merge commit (comment 6) gives some insights into where we are loosing time and I guess we should look there.

#13 Updated by Patrick Carlisle over 4 years ago

  • Status changed from In Topic Branch Pending Review to Code Insufficient

The only other thing that jumps out to me is the instrumentation framework (#9584). I don’t see any discussion on that ticket around overhead. Brice, did you guys measure the overhead of this change? The preliminary evidence here suggests an overhead of around 10% and I wonder if we could get it lower. I need to get set up to benchmark and profile this myself so I can dig deeper.

#14 Updated by Brice Figureau about 4 years ago

Patrick Carlisle wrote:

The only other thing that jumps out to me is the instrumentation framework (#9584). I don’t see any discussion on that ticket around overhead. Brice, did you guys measure the overhead of this change? The preliminary evidence here suggests an overhead of around 10% and I wonder if we could get it lower. I need to get set up to benchmark and profile this myself so I can dig deeper.

When not activated (which it is the default), then the PIF doesn’t do anything (it is even not called). And in its current stage, only the indirector can be instrumented which during the life of a catalog compilation + agent run shouldn’t generate a lot of events.

You can try to revert locally the commits of the PIF and check how much you’ve gained (my gut feeling is that you won’t get anything).

#15 Updated by R.I. Pienaar about 4 years ago

I’ve spent some time with Brice today and we narrowed the 10 second loss in my manifests to https://github.com/puppetlabs/puppet/blob/master/lib/puppet/util/instrumentation/listeners/process_name.rb#L31-34

This combined with the fact fixes from Patrick gives me back my 12 seconds.

Now trying with a few different rubies under rvm to see if its related, i am on ruby 1.8.7 (2011-06-30 patchlevel 352) [x86_64-linux]

#16 Updated by Brice Figureau about 4 years ago

R.I. Pienaar wrote:

I’ve spent some time with Brice today and we narrowed the 10 second loss in my manifests to https://github.com/puppetlabs/puppet/blob/master/lib/puppet/util/instrumentation/listeners/process_name.rb#L31-34

To elaborate:

  • when a puppet app starts, the instrumentation layer initialize
  • it loads and registers all the instrumentation listeners, but those are never executed until activated
  • except if the listener has a “subscribed” callback, this one is called
  • the process_name one, has a “subscribed” callback
  • this callback, creates a thread that loops forever and changes the current process name if told so and sleeps for 1 second

We concluded with RI that having this thread doing nothing is what causes the general slow-down of the whole process. As RI mentioned, it’s possible that it is dependent on his ruby version or even compilation options (as was the case in lenny shipped ruby back a couple of years ago).

I’ll try to come with a patch that defers the creation of the thread to when the listener is really used. A work-around is to completely remove the process_name.rb file, thus this way it won’t be loaded.

#17 Updated by Anonymous about 4 years ago

Brice Figureau wrote:

We concluded with RI that having this thread doing nothing is what causes the general slow-down of the whole process. As RI mentioned, it’s possible that it is dependent on his ruby version or even compilation options (as was the case in lenny shipped ruby back a couple of years ago).

I’ll try to come with a patch that defers the creation of the thread to when the listener is really used. A work-around is to completely remove the process_name.rb file, thus this way it won’t be loaded.

This may be a question answered by reading the code, but call me lazy: is there an actual need for the extra thread? Changing $0 should be pretty fast, and doing it from the main “thread” should have the same or lower cost to doing it in another “thread” – because Ruby is all green threads, so you have a context switch cost, but no actual parallel operation going on.

If you did this to rate-limit changes or something, just inline that in the method that handles the “state change” notification and do nothing when updates are too frequent. :)

#18 Updated by Brice Figureau about 4 years ago

Daniel Pittman wrote:

Brice Figureau wrote:

We concluded with RI that having this thread doing nothing is what causes the general slow-down of the whole process. As RI mentioned, it’s possible that it is dependent on his ruby version or even compilation options (as was the case in lenny shipped ruby back a couple of years ago).

I’ll try to come with a patch that defers the creation of the thread to when the listener is really used. A work-around is to completely remove the process_name.rb file, thus this way it won’t be loaded.

This may be a question answered by reading the code, but call me lazy: is there an actual need for the extra thread? Changing $0 should be pretty fast, and doing it from the main “thread” should have the same or lower cost to doing it in another “thread” – because Ruby is all green threads, so you have a context switch cost, but no actual parallel operation going on.

If you did this to rate-limit changes or something, just inline that in the method that handles the “state change” notification and do nothing when updates are too frequent. :)

No this is a scheduled timer to rotate long process names (like a stock ticker). So rotating only when there’s a new event wouldn’t rotate it when nothing happens. But maybe there can be some other means to have such regular timer that wouldn’t consume a thread.

#19 Updated by Anonymous about 4 years ago

Brice Figureau wrote:

Daniel Pittman wrote:

Brice Figureau wrote:

We concluded with RI that having this thread doing nothing is what causes the general slow-down of the whole process. As RI mentioned, it’s possible that it is dependent on his ruby version or even compilation options (as was the case in lenny shipped ruby back a couple of years ago).

I’ll try to come with a patch that defers the creation of the thread to when the listener is really used. A work-around is to completely remove the process_name.rb file, thus this way it won’t be loaded.

This may be a question answered by reading the code, but call me lazy: is there an actual need for the extra thread? Changing $0 should be pretty fast, and doing it from the main “thread” should have the same or lower cost to doing it in another “thread” – because Ruby is all green threads, so you have a context switch cost, but no actual parallel operation going on.

If you did this to rate-limit changes or something, just inline that in the method that handles the “state change” notification and do nothing when updates are too frequent. :)

No this is a scheduled timer to rotate long process names (like a stock ticker). So rotating only when there’s a new event wouldn’t rotate it when nothing happens. But maybe there can be some other means to have such regular timer that wouldn’t consume a thread.

Hrm. I don’t know I consider that a compelling feature, personally, so losing the thread doesn’t feel to me like a big cost and all.

#20 Updated by Brice Figureau about 4 years ago

Daniel Pittman wrote:

Brice Figureau wrote:

Daniel Pittman wrote:

Brice Figureau wrote:

We concluded with RI that having this thread doing nothing is what causes the general slow-down of the whole process. As RI mentioned, it’s possible that it is dependent on his ruby version or even compilation options (as was the case in lenny shipped ruby back a couple of years ago).

I’ll try to come with a patch that defers the creation of the thread to when the listener is really used. A work-around is to completely remove the process_name.rb file, thus this way it won’t be loaded.

This may be a question answered by reading the code, but call me lazy: is there an actual need for the extra thread? Changing $0 should be pretty fast, and doing it from the main “thread” should have the same or lower cost to doing it in another “thread” – because Ruby is all green threads, so you have a context switch cost, but no actual parallel operation going on.

If you did this to rate-limit changes or something, just inline that in the method that handles the “state change” notification and do nothing when updates are too frequent. :)

No this is a scheduled timer to rotate long process names (like a stock ticker). So rotating only when there’s a new event wouldn’t rotate it when nothing happens. But maybe there can be some other means to have such regular timer that wouldn’t consume a thread.

Hrm. I don’t know I consider that a compelling feature, personally, so losing the thread doesn’t feel to me like a big cost and all.

Note that this listener was just an example listener for other people to create their own, it shouldn’t have even been used normally.

Exactly. But we can do better, puppet embeds the event-loop which contains exactly what is needed but without using any threads.

#21 Updated by Anonymous about 4 years ago

Brice Figureau wrote:

Daniel Pittman wrote:

Brice Figureau wrote:

No this is a scheduled timer to rotate long process names (like a stock ticker). So rotating only when there’s a new event wouldn’t rotate it when nothing happens. But maybe there can be some other means to have such regular timer that wouldn’t consume a thread.

Hrm. I don’t know I consider that a compelling feature, personally, so losing the thread doesn’t feel to me like a big cost and all.

Note that this listener was just an example listener for other people to create their own, it shouldn’t have even been used normally.

Huh. I would think the process name was the most likely listener to benefit our users – like PostgreSQL, being able to get a summary of state with ps -ef or ps axwww is a big win to knowing what is going on.

Exactly. But we can do better, puppet embeds the event-loop which contains exactly what is needed but without using any threads.

Given the one use of that, we were looking to strip that out, actually. Almost everything (like listen mode) just uses a background thread, and it is a heavy bit of code to carry just to check if the config changed every 15 seconds, and to wake up when configured in daemon mode. Adding more consumers isn’t really high on the agenda, and I don’t think we return to the event loop regularly except when the daemonized agent is idle.

#22 Updated by Brice Figureau about 4 years ago

Daniel Pittman wrote:

Brice Figureau wrote:

Daniel Pittman wrote:

Brice Figureau wrote:

No this is a scheduled timer to rotate long process names (like a stock ticker). So rotating only when there’s a new event wouldn’t rotate it when nothing happens. But maybe there can be some other means to have such regular timer that wouldn’t consume a thread.

Hrm. I don’t know I consider that a compelling feature, personally, so losing the thread doesn’t feel to me like a big cost and all.

Note that this listener was just an example listener for other people to create their own, it shouldn’t have even been used normally.

Huh. I would think the process name was the most likely listener to benefit our users – like PostgreSQL, being able to get a summary of state with ps -ef or ps axwww is a big win to knowing what is going on.

Yes, that’s true.

Exactly. But we can do better, puppet embeds the event-loop which contains exactly what is needed but without using any threads.

Given the one use of that, we were looking to strip that out, actually. Almost everything (like listen mode) just uses a background thread, and it is a heavy bit of code to carry just to check if the config changed every 15 seconds, and to wake up when configured in daemon mode. Adding more consumers isn’t really high on the agenda, and I don’t think we return to the event loop regularly except when the daemonized agent is idle.

I can see plenty of advantages of having a real timer. But I understand your point of view.

I’ll see how I can fix the original bug later on (too bad for me as I had the EL one almost ready :)

#23 Updated by Anonymous about 4 years ago

Brice Figureau wrote:

Daniel Pittman wrote:

Given the one use of that, we were looking to strip that out, actually. Almost everything (like listen mode) just uses a background thread, and it is a heavy bit of code to carry just to check if the config changed every 15 seconds, and to wake up when configured in daemon mode. Adding more consumers isn’t really high on the agenda, and I don’t think we return to the event loop regularly except when the daemonized agent is idle.

I can see plenty of advantages of having a real timer. But I understand your point of view.

Well, you can also find a real use for it, which would be enough to reconsider. …or, at least, to implement some sort of scheduled event facility that worked with the product, anyhow. :)

#24 Updated by Anonymous about 4 years ago

Puppet Users

There’s a bit of chatter about this on the Puppet Users list. Here’s the thread. where Grant confirmed moving the process_name listener improved performance.

#25 Updated by Patrick Carlisle about 4 years ago

We’re looking to make a quick release of 2.7.11 to fix this and a couple of other regressions. Brice, would it be reasonable for us to just remove process_name.rb for now and then we can put it back with a deeper fix for 2.7.12? My understanding so far is we can do that without breaking the rest of the instrumentation framework. I’d also be happy with a code fix that could go into an rc today or Monday.

#26 Updated by Brice Figureau about 4 years ago

Patrick Carlisle wrote:

We’re looking to make a quick release of 2.7.11 to fix this and a couple of other regressions. Brice, would it be reasonable for us to just remove process_name.rb for now and then we can put it back with a deeper fix for 2.7.12? My understanding so far is we can do that without breaking the rest of the instrumentation framework. I’d also be happy with a code fix that could go into an rc today or Monday.

I’m abroad the week end (fosdem), and the only fix I have is using the event loop, which Daniel wasnt wanting to merge, so our only option right now is to remove the offending file. Then we’ll see how to put back the missing functionality in the next version. So I’m fine with your plan.

#27 Updated by Brice Figureau about 4 years ago

Patrick Carlisle wrote:

We’re looking to make a quick release of 2.7.11 to fix this and a couple of other regressions. Brice, would it be reasonable for us to just remove process_name.rb for now and then we can put it back with a deeper fix for 2.7.12? My understanding so far is we can do that without breaking the rest of the instrumentation framework. I’d also be happy with a code fix that could go into an rc today or Monday.

Here is the Event Loop based fix, if you ever prefer it over removing the process name listener altogether. https://github.com/masterzen/puppet/tree/tickets/2.7.x/12310

Sorry for the inconvenience,

#28 Updated by Patrick Carlisle about 4 years ago

  • Status changed from Code Insufficient to Merged - Pending Release
  • Target version changed from 2.7.x to 2.7.11

#29 Updated by Matthaus Owens about 4 years ago

  • Status changed from Merged - Pending Release to Closed

released in Puppet 2.7.11

Also available in: Atom PDF