The Puppet Labs Issue Tracker has Moved: https://tickets.puppetlabs.com
https://tickets.puppetlabs.com. See the following page for information on filing tickets with JIRA:
Terrible exported resources performance
|Assignee:||Ken Barber||% Done:|
|Keywords:||Affected PuppetDB version:|
Ticket tracking is now hosted in JIRA: https://tickets.puppetlabs.com
Puppetdb requires large amount of memory with few thousand exported resources collected on single host from about a 100 nodes or it crashes.
Puppet agent run is extremely slow (over an hour) even with 4-5GB of ram for puppetdb.
Postgres backend for puppetdb does reduce puppetmaster load, but only slightly reduces puppetdb memory requiremens, and does nothing to shorten the entire process.
Exact same setup (same modules, same node configuration) with 30 nodes and 1GB for hsql based puppedb runs in 2-2.5 minutes.
#2 Updated by Daniel Siechniewicz over 3 years ago
Succesfull run in large environment, puppetmaster vm with 16GB of memory and -Xmx4g for puppetdb, postgres 8.4 backend
Notice: Finished catalog run in 219.55 seconds Changes: Total: 6 Events: Success: 6 Total: 6 Resources: Restarted: 1 Skipped: 6 Changed: 6 Out of sync: 6 Total: 7315 Time: Nagios servicegroup: 0.00 Filebucket: 0.00 Nagios contactgroup: 0.00 Group: 0.00 Nagios contact: 0.00 Mount: 0.00 User: 0.00 Package: 0.01 Nagios host: 0.12 Nagios command: 0.45 Exec: 1.27 Nagios service: 1.36 Service: 1.81 Last run: 1358955947 Config retrieval: 4677.32 Total: 4690.90 File: 8.55 Version: Config: 1358950951 Puppet: 3.0.2 real 83m35.775s user 5m28.960s sys 0m19.217s
#3 Updated by Daniel Siechniewicz over 3 years ago
Succesfull run in small environment, puppetmaster vm with 8GB of memory and -Xmx1g for puppetdb, no postgres:
Notice: Finished catalog run in 22.73 seconds Changes: Total: 2 Events: Total: 2 Success: 2 Resources: Changed: 2 Out of sync: 2 Total: 2044 Skipped: 6 Time: Filebucket: 0.00 Nagios contactgroup: 0.00 Nagios servicegroup: 0.00 Group: 0.00 Nagios contact: 0.00 User: 0.00 Package: 0.00 Nagios host: 0.02 Nagios command: 0.12 Nagios service: 0.27 Exec: 0.95 Service: 1.67 Config retrieval: 115.92 Total: 121.86 Last run: 1358963632 File: 2.90 Version: Config: 1358902058 Puppet: 3.0.2 real 2m32.889s user 0m33.635s sys 0m4.749s
#4 Updated by Ken Barber over 3 years ago
I’m going to run some tests locally to see if I can replicate your issue, here is the module I’m using as a base:
Thats 10000 resources and 23 parameters per resource that should be exported and imported. I’ve started at a 1 gb heap (-Xmx1g) and I’m going to see if this crashes, and if so increase it until it does not.
So far the export was fine, just had to tweak:
resource-query-limit = 2000000
The results of the initial export on my client machine were:
root@puppetdbclient1:~# puppet agent -t --summarize info: Caching catalog for puppetdbclient1.vm info: Applying configuration version '1358866454' notice: Finished catalog run in 0.14 seconds Changes: Events: Resources: Skipped: 6 Total: 7 Time: Filebucket: 0.00 Last run: 1358923841 Config retrieval: 86.82 Total: 86.82 Version: Config: 1358866454 Puppet: 2.7.18
It did peg the CPU of the puppetdb and postgresql databases for a while (during insert, and gc) but no crashes so far.
The collection run looks like this:
notice: Finished catalog run in 41974.77 seconds Changes: Total: 10000 Events: Success: 10000 Total: 10000 Resources: Out of sync: 10000 Changed: 10000 Total: 10009 Skipped: 6 Time: Filebucket: 0.00 Resources: 0.00 File: 0.00 Last run: 1358909822 Config retrieval: 399.85 Nagios service: 41944.84 Total: 42344.70 Version: Config: 1358866454 Puppet: 2.7.18 real 709m43.886s user 689m55.799s sys 4m17.744s
With no OOM crashes either. The config retrieval is only 400 seconds, in this case which is nothing like what you are seeing Daniel. However, the overall time is still enormous … but this is actually due to the Nagios_service resource in the Puppet agent taking forever to insert the 10000 entries one by one into nagios_server.cfg, each time its reading in the file, inserting an entry, storing the file etc. etc.. This is more or less what I expected actually.
#5 Updated by Daniel Siechniewicz over 3 years ago
OK, I had this idea last night that I just tested and it speeds the puppet agent run to something acceptable:
Notice: Finished catalog run in 243.53 seconds Changes: Total: 6859 Events: Success: 6859 Total: 6859 Resources: Failed to restart: 1 Skipped: 6 Out of sync: 6859 Changed: 6859 Total: 7200 Time: Filebucket: 0.00 Nagios servicegroup: 0.00 Nagios contactgroup: 0.00 Group: 0.00 Nagios contact: 0.00 Nagios host: 0.00 Mount: 0.00 User: 0.00 Package: 0.01 Service: 0.89 Exec: 1.47 Config retrieval: 115.41 Nagios service: 13.94 Last run: 1359015259 Total: 184.03 File: 45.53 Nagios command: 6.78 Version: Config: 1359014423 Puppet: 3.0.2 real 7m0.163s user 4m13.841s sys 0m17.317s
The culprits being these two lines in two manifest files:
./nsca/server.pp: #File <<| tag == $get_tag |>> -> Nagios_host <<| tag == $get_tag |>> ./nrpe/server.pp: #File <<| tag == $get_tag |>> -> Nagios_host <<| tag == $get_tag |>>
replacing them with unchained:
File <<| tag == $get_tag |>> Nagios_host <<| tag == $get_tag |>>
causes it to run even with 1GB for puppetdb (still a 16GB vm) in under 10 mins, which is acceptable.
Seems that chaining exported resources might not be too efficient and produces lots of data that could be the reason for puppetdb crashing.