The Puppet Labs Issue Tracker has Moved: https://tickets.puppetlabs.com

This issue tracker is now in read-only archive mode and automatic ticket export has been disabled. Redmine users will need to create a new JIRA account to file tickets using https://tickets.puppetlabs.com. See the following page for information on filing tickets with JIRA:

Bug #18804

Terrible exported resources performance

Added by Daniel Siechniewicz over 3 years ago. Updated over 3 years ago.

Status:ClosedStart date:
Priority:NormalDue date:
Assignee:Ken Barber% Done:

0%

Category:-
Target version:-
Keywords: Affected PuppetDB version:
Branch:

We've Moved!

Ticket tracking is now hosted in JIRA: https://tickets.puppetlabs.com


Description

Puppetdb requires large amount of memory with few thousand exported resources collected on single host from about a 100 nodes or it crashes.

Puppet agent run is extremely slow (over an hour) even with 4-5GB of ram for puppetdb.

Postgres backend for puppetdb does reduce puppetmaster load, but only slightly reduces puppetdb memory requiremens, and does nothing to shorten the entire process.

Exact same setup (same modules, same node configuration) with 30 nodes and 1GB for hsql based puppedb runs in 2-2.5 minutes.


Related issues

Related to PuppetDB - Feature #18881: Handle large catalog sizes gracefully Accepted
Related to Puppet - Feature #18882: Report # of relationships in --summarize Accepted

History

#1 Updated by Ken Barber over 3 years ago

  • Status changed from Unreviewed to Accepted
  • Assignee set to Ken Barber

#2 Updated by Daniel Siechniewicz over 3 years ago

Succesfull run in large environment, puppetmaster vm with 16GB of memory and -Xmx4g for puppetdb, postgres 8.4 backend

Notice: Finished catalog run in 219.55 seconds
Changes:
            Total: 6
Events:
          Success: 6
            Total: 6
Resources:
        Restarted: 1
          Skipped: 6
          Changed: 6
      Out of sync: 6
            Total: 7315
Time:
   Nagios servicegroup: 0.00
       Filebucket: 0.00
   Nagios contactgroup: 0.00
            Group: 0.00
   Nagios contact: 0.00
            Mount: 0.00
             User: 0.00
          Package: 0.01
      Nagios host: 0.12
   Nagios command: 0.45
             Exec: 1.27
   Nagios service: 1.36
          Service: 1.81
         Last run: 1358955947
   Config retrieval: 4677.32
            Total: 4690.90
             File: 8.55
Version:
           Config: 1358950951
           Puppet: 3.0.2

real    83m35.775s
user    5m28.960s
sys     0m19.217s

#3 Updated by Daniel Siechniewicz over 3 years ago

Succesfull run in small environment, puppetmaster vm with 8GB of memory and -Xmx1g for puppetdb, no postgres:

Notice: Finished catalog run in 22.73 seconds
Changes:
            Total: 2
Events:
            Total: 2
          Success: 2
Resources:
          Changed: 2
      Out of sync: 2
            Total: 2044
          Skipped: 6
Time:
       Filebucket: 0.00
   Nagios contactgroup: 0.00
   Nagios servicegroup: 0.00
            Group: 0.00
   Nagios contact: 0.00
             User: 0.00
          Package: 0.00
      Nagios host: 0.02
   Nagios command: 0.12
   Nagios service: 0.27
             Exec: 0.95
          Service: 1.67
   Config retrieval: 115.92
            Total: 121.86
         Last run: 1358963632
             File: 2.90
Version:
           Config: 1358902058
           Puppet: 3.0.2

real    2m32.889s
user    0m33.635s
sys     0m4.749s

#4 Updated by Ken Barber over 3 years ago

I’m going to run some tests locally to see if I can replicate your issue, here is the module I’m using as a base:

https://github.com/kbarber/puppet-module-nagios_tests

Thats 10000 resources and 23 parameters per resource that should be exported and imported. I’ve started at a 1 gb heap (-Xmx1g) and I’m going to see if this crashes, and if so increase it until it does not.

So far the export was fine, just had to tweak:

In puppetdb:

resource-query-limit = 2000000

In puppet.conf:

configtimeout=7200

The results of the initial export on my client machine were:

root@puppetdbclient1:~# puppet agent -t --summarize
info: Caching catalog for puppetdbclient1.vm
info: Applying configuration version '1358866454'
notice: Finished catalog run in 0.14 seconds
Changes:
Events:
Resources:
          Skipped: 6
            Total: 7
Time:
       Filebucket: 0.00
         Last run: 1358923841
   Config retrieval: 86.82
            Total: 86.82
Version:
           Config: 1358866454
           Puppet: 2.7.18

It did peg the CPU of the puppetdb and postgresql databases for a while (during insert, and gc) but no crashes so far.

The collection run looks like this:

notice: Finished catalog run in 41974.77 seconds
Changes:
            Total: 10000
Events:
          Success: 10000
            Total: 10000
Resources:
      Out of sync: 10000
          Changed: 10000
            Total: 10009
          Skipped: 6
Time:
       Filebucket: 0.00
        Resources: 0.00
             File: 0.00
         Last run: 1358909822
   Config retrieval: 399.85
   Nagios service: 41944.84
            Total: 42344.70
Version:
           Config: 1358866454
           Puppet: 2.7.18

real    709m43.886s
user    689m55.799s
sys 4m17.744s

With no OOM crashes either. The config retrieval is only 400 seconds, in this case which is nothing like what you are seeing Daniel. However, the overall time is still enormous … but this is actually due to the Nagios_service resource in the Puppet agent taking forever to insert the 10000 entries one by one into nagios_server.cfg, each time its reading in the file, inserting an entry, storing the file etc. etc.. This is more or less what I expected actually.

#5 Updated by Daniel Siechniewicz over 3 years ago

OK, I had this idea last night that I just tested and it speeds the puppet agent run to something acceptable:

Notice: Finished catalog run in 243.53 seconds
Changes:
            Total: 6859
Events:
          Success: 6859
            Total: 6859
Resources:
   Failed to restart: 1
          Skipped: 6
      Out of sync: 6859
          Changed: 6859
            Total: 7200
Time:
       Filebucket: 0.00
   Nagios servicegroup: 0.00
   Nagios contactgroup: 0.00
            Group: 0.00
   Nagios contact: 0.00
      Nagios host: 0.00
            Mount: 0.00
             User: 0.00
          Package: 0.01
          Service: 0.89
             Exec: 1.47
   Config retrieval: 115.41
   Nagios service: 13.94
         Last run: 1359015259
            Total: 184.03
             File: 45.53
   Nagios command: 6.78
Version:
           Config: 1359014423
           Puppet: 3.0.2

real    7m0.163s
user    4m13.841s
sys     0m17.317s

The culprits being these two lines in two manifest files:

./nsca/server.pp:  #File <<| tag == $get_tag |>> -> Nagios_host <<| tag == $get_tag |>>
./nrpe/server.pp:  #File <<| tag == $get_tag |>> -> Nagios_host <<| tag == $get_tag |>> 

replacing them with unchained:

File <<| tag == $get_tag |>>
Nagios_host <<| tag == $get_tag |>>

causes it to run even with 1GB for puppetdb (still a 16GB vm) in under 10 mins, which is acceptable.

Seems that chaining exported resources might not be too efficient and produces lots of data that could be the reason for puppetdb crashing.

#6 Updated by Ken Barber over 3 years ago

Seems that chaining exported resources might not be too efficient and produces lots of data that could be the reason for puppetdb crashing.

Hmm … that is interesting.

#7 Updated by Ken Barber over 3 years ago

Daniel,

I think if you can supply the puppet manifest code you have mentioned before and after so I can replicate this – that would be great.

ken.

#8 Updated by Ken Barber over 3 years ago

  • Status changed from Accepted to Closed

Opened #18881 to deal with large catalog sizes, and #18882 to get better edge reporting in puppet. Closing this one for now in favour of those features tickets. Thanks Daniel.

Also available in: Atom PDF