The Puppet Labs Issue Tracker has Moved: https://tickets.puppetlabs.com

Bug #11143

Puppet 2.6.12 fails to cope with corrupt yaml files in {facts, nodes} and hangs

Added by Jo Rhett almost 3 years ago. Updated over 1 year ago.

Status:AcceptedStart date:12/02/2011
Priority:HighDue date:
Assignee:-% Done:

0%

Category:parser
Target version:-
Affected Puppet version:2.6.12 Branch:
Keywords:

We've Moved!

Ticket tracking is now hosted in JIRA: https://tickets.puppetlabs.com

This ticket may be automatically exported to the PUP project on JIRA using the button below:


Description

So we were going through all the systems and finding ones which had corked up due to bug #10418. As we enabled each one and brought them online, I found two systems which could not get a catalog from the system. Debug and trace got to evaluating the server certificate, and then called a timeout some 5 minutes later. It never got a catalog.

I could delete the client certificate (in fact all of /var/lib/puppet) and clean the cert on the master, create a new cert and sign it. The certificate stuff worked fine, but the catalog wouldn’t work. The puppet master never reports compiling the catalog.

There is no inconsistency on this systeme. The affected system was axax009, and 1-8 and 10 which are identical OS, software, same subnet, everything all work fine. In fact, two of those are on the same VMware host as axax009.

# puppet agent --test --debug --trace
debug: Puppet::Type::User::ProviderUser_role_add: file roleadd does not exist
debug: Puppet::Type::User::ProviderDirectoryservice: file /usr/bin/dscl does not exist
debug: Puppet::Type::User::ProviderPw: file pw does not exist
debug: Failed to load library 'ldap' for feature 'ldap'
debug: Puppet::Type::User::ProviderLdap: feature ldap is missing
debug: Failed to load library 'rubygems' for feature 'rubygems'
debug: Puppet::Type::File::ProviderMicrosoft_windows: feature microsoft_windows is missing
debug: /File[/etc/puppet/puppet.conf]: Autorequiring File[/etc/puppet]
debug: /File[/var/lib/puppet/state/graphs]: Autorequiring File[/var/lib/puppet/state]
debug: /File[/var/lib/puppet/clientbucket]: Autorequiring File[/var/lib/puppet]
debug: /File[/var/lib/puppet/state/last_run_report.yaml]: Autorequiring File[/var/lib/puppet/state]
debug: /File[/var/lib/puppet/ssl]: Autorequiring File[/var/lib/puppet]
debug: /File[/var/lib/puppet/lib]: Autorequiring File[/var/lib/puppet]
debug: /File[/var/lib/puppet/state]: Autorequiring File[/var/lib/puppet]
debug: /File[/var/lib/puppet/client_data]: Autorequiring File[/var/lib/puppet]
debug: /File[/var/lib/puppet/client_yaml]: Autorequiring File[/var/lib/puppet]
debug: /File[/var/lib/puppet/ssl/private_keys/axax009.company.com.pem]: Autorequiring File[/var/lib/puppet/ssl/private_keys]
debug: /File[/var/lib/puppet/log/http.log]: Autorequiring File[/var/lib/puppet/log]
debug: /File[/var/lib/puppet/ssl/certs/ca.pem]: Autorequiring File[/var/lib/puppet/ssl/certs]
debug: /File[/var/run/puppet/agent.pid]: Autorequiring File[/var/run/puppet]
debug: /File[/var/lib/puppet/ssl/public_keys]: Autorequiring File[/var/lib/puppet/ssl]
debug: /File[/var/lib/puppet/state/last_run_summary.yaml]: Autorequiring File[/var/lib/puppet/state]
debug: /File[/var/lib/puppet/ssl/private]: Autorequiring File[/var/lib/puppet/ssl]
debug: /File[/var/lib/puppet/facts]: Autorequiring File[/var/lib/puppet]
debug: /File[/var/lib/puppet/ssl/certificate_requests]: Autorequiring File[/var/lib/puppet/ssl]
debug: /File[/etc/puppet/namespaceauth.conf]: Autorequiring File[/etc/puppet]
debug: /File[/var/lib/puppet/ssl/certs/axax009.company.com.pem]: Autorequiring File[/var/lib/puppet/ssl/certs]
debug: /File[/var/lib/puppet/log]: Autorequiring File[/var/lib/puppet]
debug: /File[/var/lib/puppet/ssl/public_keys/axax009.company.com.pem]: Autorequiring File[/var/lib/puppet/ssl/public_keys]
debug: /File[/var/lib/puppet/ssl/crl.pem]: Autorequiring File[/var/lib/puppet/ssl]
debug: /File[/var/lib/puppet/ssl/private_keys]: Autorequiring File[/var/lib/puppet/ssl]
debug: /File[/var/lib/puppet/ssl/certs]: Autorequiring File[/var/lib/puppet/ssl]
debug: /File[/var/lib/puppet/state/state.yaml]: Autorequiring File[/var/lib/puppet/state]
debug: /File[/var/lib/puppet/classes.txt]: Autorequiring File[/var/lib/puppet]
debug: Finishing transaction 23772566926920
debug: /File[/etc/puppet/namespaceauth.conf]: Autorequiring File[/etc/puppet]
debug: /File[/var/lib/puppet/log]: Autorequiring File[/var/lib/puppet]
debug: /File[/var/lib/puppet/ssl]: Autorequiring File[/var/lib/puppet]
debug: /File[/var/lib/puppet/state]: Autorequiring File[/var/lib/puppet]
debug: /File[/var/lib/puppet/ssl/private_keys]: Autorequiring File[/var/lib/puppet/ssl]
debug: /File[/var/lib/puppet/ssl/crl.pem]: Autorequiring File[/var/lib/puppet/ssl]
debug: /File[/var/lib/puppet/facts]: Autorequiring File[/var/lib/puppet]
debug: /File[/var/lib/puppet/ssl/public_keys]: Autorequiring File[/var/lib/puppet/ssl]
debug: /File[/var/lib/puppet/ssl/private]: Autorequiring File[/var/lib/puppet/ssl]
debug: /File[/var/lib/puppet/ssl/certs/ca.pem]: Autorequiring File[/var/lib/puppet/ssl/certs]
debug: /File[/var/lib/puppet/ssl/certs/axax009.company.com.pem]: Autorequiring File[/var/lib/puppet/ssl/certs]
debug: /File[/var/lib/puppet/ssl/certs]: Autorequiring File[/var/lib/puppet/ssl]
debug: /File[/var/lib/puppet/ssl/private_keys/axax009.company.com.pem]: Autorequiring File[/var/lib/puppet/ssl/private_keys]
debug: /File[/var/lib/puppet/lib]: Autorequiring File[/var/lib/puppet]
debug: /File[/var/lib/puppet/ssl/public_keys/axax009.company.com.pem]: Autorequiring File[/var/lib/puppet/ssl/public_keys]
debug: /File[/var/lib/puppet/log/http.log]: Autorequiring File[/var/lib/puppet/log]
debug: /File[/var/lib/puppet/ssl/certificate_requests]: Autorequiring File[/var/lib/puppet/ssl]
debug: Finishing transaction 23772565943600
debug: Using cached certificate for ca
debug: Using cached certificate for axax009.company.com
debug: Finishing transaction 23772565460720
debug: Loaded state in 0.06 seconds
debug: Using cached certificate for ca
debug: Using cached certificate for axax009.company.com
debug: Using cached certificate_revocation_list for ca
debug: catalog supports formats: b64_zlib_yaml dot marshal pson raw yaml; using pson

/usr/lib/ruby/1.8/timeout.rb:54:in `rbuf_fill'
/usr/lib/ruby/1.8/timeout.rb:56:in `timeout'
/usr/lib/ruby/1.8/timeout.rb:76:in `timeout'
/usr/lib/ruby/1.8/net/protocol.rb:132:in `rbuf_fill'
/usr/lib/ruby/1.8/net/protocol.rb:116:in `readuntil'
/usr/lib/ruby/1.8/net/protocol.rb:126:in `readline'
/usr/lib/ruby/1.8/net/http.rb:2020:in `read_status_line'
/usr/lib/ruby/1.8/net/http.rb:2009:in `read_new'
/usr/lib/ruby/1.8/net/http.rb:1050:in `request'
/usr/lib/ruby/1.8/net/http.rb:1037:in `request'
/usr/lib/ruby/1.8/net/http.rb:543:in `start'
/usr/lib/ruby/1.8/net/http.rb:1035:in `request'
/usr/lib/ruby/1.8/net/http.rb:772:in `get'
/usr/lib/ruby/site_ruby/1.8/puppet/indirector/rest.rb:75:in `find'
/usr/lib/ruby/site_ruby/1.8/puppet/indirector/indirection.rb:188:in `find'
/usr/lib/ruby/site_ruby/1.8/puppet/indirector.rb:50:in `find'
/usr/lib/ruby/site_ruby/1.8/puppet/configurer.rb:240:in `retrieve_new_catalog'
/usr/lib/ruby/site_ruby/1.8/puppet/util.rb:429:in `thinmark'
/usr/lib/ruby/1.8/benchmark.rb:293:in `measure'
/usr/lib/ruby/1.8/benchmark.rb:307:in `realtime'
/usr/lib/ruby/site_ruby/1.8/puppet/util.rb:428:in `thinmark'
/usr/lib/ruby/site_ruby/1.8/puppet/configurer.rb:239:in `retrieve_new_catalog'
/usr/lib/ruby/site_ruby/1.8/puppet/configurer.rb:86:in `retrieve_catalog'
/usr/lib/ruby/site_ruby/1.8/puppet/configurer.rb:111:in `retrieve_and_apply_catalog'
/usr/lib/ruby/site_ruby/1.8/puppet/configurer.rb:150:in `run'
/usr/lib/ruby/site_ruby/1.8/puppet/agent.rb:39:in `run'
/usr/lib/ruby/site_ruby/1.8/puppet/agent/locker.rb:21:in `lock'
/usr/lib/ruby/site_ruby/1.8/puppet/agent.rb:39:in `run'
/usr/lib/ruby/1.8/sync.rb:229:in `synchronize'
/usr/lib/ruby/site_ruby/1.8/puppet/agent.rb:39:in `run'
/usr/lib/ruby/site_ruby/1.8/puppet/agent.rb:103:in `with_client'
/usr/lib/ruby/site_ruby/1.8/puppet/agent.rb:37:in `run'
/usr/lib/ruby/site_ruby/1.8/puppet/application.rb:172:in `call'
/usr/lib/ruby/site_ruby/1.8/puppet/application.rb:172:in `controlled_run'
/usr/lib/ruby/site_ruby/1.8/puppet/agent.rb:35:in `run'
/usr/lib/ruby/site_ruby/1.8/puppet/application/agent.rb:114:in `onetime'
/usr/lib/ruby/site_ruby/1.8/puppet/application/agent.rb:88:in `run_command'
/usr/lib/ruby/site_ruby/1.8/puppet/application.rb:305:in `run'
/usr/lib/ruby/site_ruby/1.8/puppet/application.rb:420:in `hook'
/usr/lib/ruby/site_ruby/1.8/puppet/application.rb:305:in `run'
/usr/lib/ruby/site_ruby/1.8/puppet/application.rb:411:in `exit_on_fail'
/usr/lib/ruby/site_ruby/1.8/puppet/application.rb:305:in `run'
/usr/lib/ruby/site_ruby/1.8/puppet/util/command_line.rb:62:in `execute'
/usr/bin/puppet:4
err: Could not retrieve catalog from remote server: execution expired
warning: Not using cache on failed catalog
err: Could not retrieve catalog; skipping run
debug: Using cached certificate for ca
debug: Using cached certificate for axax009.company.com
debug: Using cached certificate_revocation_list for ca
debug: Value of 'preferred_serialization_format' (pson) is invalid for report, using default (b64_zlib_yaml)
debug: report supports formats: b64_zlib_yaml marshal raw yaml; using b65_zlib_yaml

Related issues

Duplicated by Puppet - Bug #11140: Passenger servers lock up completely.... Duplicate 12/02/2011

History

#1 Updated by Jo Rhett almost 3 years ago

Oddly, it does succeed in reporting its failures via tagmail:

Begin forwarded message:

Subject: Puppet Report for axax009.company.com Date: December 2, 2011 11:14:09 AM PST To: ops-puppet-error ops-puppet-error@company.com

Fri Dec 02 19:14:09 +0000 2011 Puppet (err): Could not retrieve catalog from remote server: execution expired Fri Dec 02 19:14:09 +0000 2011 Puppet (err): Could not retrieve catalog; skipping run

#2 Updated by Jo Rhett almost 3 years ago

I tried removing the cert from puppetmaster and removing the entire /var/lib/puppet directory on the client but got the exact same response. After getting a new cert and signing it, the client accepted the cert and then hung waiting for catalog. The process on the puppetmaster hung.

#3 Updated by Nigel Kersten almost 3 years ago

  • Description updated (diff)

#4 Updated by Nigel Kersten almost 3 years ago

So this is the key:

err: Could not retrieve catalog from remote server: execution expired

You said this didn’t happen when you had a webrick server right? Did you test these same agents against the webrick master on a different port? or just the agents that aren’t hanging?

#5 Updated by Jo Rhett almost 3 years ago

I had actually just realized that when I reread your message. I am now testing with one of the clients which fails, and unfortunately seeing no useful debug. I’ll send directly to you in private e-mail.

#6 Updated by Jo Rhett almost 3 years ago

The problem was corrupted yaml files. I removed the yaml files $vardir/yaml/[facts,nodes]/$hostname and the problem completely disappeared.

This is a bit problematic. yaml file can get corrupted for a wide variety of reasons. The server should not fall down dead when it occurs.

I’m sending copies of these files directly to Nigel for analysis.

#7 Updated by Nigel Kersten almost 3 years ago

  • Subject changed from Some systems never get catalog, servers lock up completely.... to Puppet 2.6.12 fails to cope with corrupt yaml files in {facts, nodes} and hangs

#8 Updated by Nigel Kersten almost 3 years ago

Just to make it clear, when someone gets onto this ticket, I have examples of the corrupt YAML files.

A quick sanity checking with irb shows them just returning the boolean ‘false’ when trying to use YAML.load_file.

#9 Updated by James Turnbull almost 3 years ago

  • Category set to parser
  • Status changed from Unreviewed to Needs Decision
  • Assignee set to Nigel Kersten

Nigel – call on priority?

#10 Updated by Nigel Kersten almost 3 years ago

  • Status changed from Needs Decision to Accepted
  • Assignee deleted (Nigel Kersten)
  • Priority changed from Normal to High
  • Target version set to 2.7.x

It’s high, but came in too late to make it to the last 2.6.x release unfortunately, as we’re already in the RC process for 2.6.13, the last 2.6.x release.

We’ve had to draw a line somewhere, and this issue is going to be the first victim.

It’s a high priority to fix in the 2.7.x series however.

#11 Updated by Jo Rhett almost 3 years ago

It seems odd to leave such a significant issue in the stable branch of your code. Not everyone wants to run with the bleeding edge.

#12 Updated by James Turnbull almost 3 years ago

Jo – 2.6.x is not the stable branch. 2.6.13 is the last release and the branch is now legacy. It’ll only get updates for security issues from here on. The 2.7.x branch is also not bleeding edge. We’re up to the 2.7.8 release: it’s stable, functional and what we consider our stable release.

#13 Updated by Lewis Marshall about 2 years ago

I’m getting a very similar (same?) issue on puppet 2.7.9 (client and server). Unfortunately deleting the yaml for this server hasn’t worked.

See the stack trace below for more details:

info: Loading facts in shadow_hostname
info: Loading facts in shadow_hostname
debug: catalog supports formats: b64_zlib_yaml dot marshal pson raw yaml; using pson
/opt/puppet/lib/ruby/1.8/timeout.rb:64:in `rbuf_fill'
/opt/puppet/lib/ruby/1.8/net/protocol.rb:134:in `rbuf_fill'
/opt/puppet/lib/ruby/1.8/net/protocol.rb:116:in `readuntil'
/opt/puppet/lib/ruby/1.8/net/protocol.rb:126:in `readline'
/opt/puppet/lib/ruby/1.8/net/http.rb:2028:in `read_status_line'
/opt/puppet/lib/ruby/1.8/net/http.rb:2017:in `read_new'
/opt/puppet/lib/ruby/1.8/net/http.rb:1051:in `request'
/opt/puppet/lib/ruby/1.8/net/http.rb:1037:in `request'
/opt/puppet/lib/ruby/1.8/net/http.rb:543:in `start'
/opt/puppet/lib/ruby/1.8/net/http.rb:1035:in `request'
/opt/puppet/lib/ruby/1.8/net/http.rb:845:in `post'
/opt/puppet/lib/ruby/site_ruby/1.8/puppet/indirector/rest.rb:94:in `send'
/opt/puppet/lib/ruby/site_ruby/1.8/puppet/indirector/rest.rb:94:in `http_request'
/opt/puppet/lib/ruby/site_ruby/1.8/puppet/indirector/rest.rb:76:in `http_post'
/opt/puppet/lib/ruby/site_ruby/1.8/puppet/indirector/rest.rb:116:in `find'
/opt/puppet/lib/ruby/site_ruby/1.8/puppet/indirector/indirection.rb:189:in `find'
/opt/puppet/lib/ruby/site_ruby/1.8/puppet/configurer.rb:237:in `retrieve_new_catalog'
/opt/puppet/lib/ruby/site_ruby/1.8/puppet/util.rb:476:in `thinmark'
/opt/puppet/lib/ruby/1.8/benchmark.rb:308:in `realtime'
/opt/puppet/lib/ruby/site_ruby/1.8/puppet/util.rb:475:in `thinmark'
/opt/puppet/lib/ruby/site_ruby/1.8/puppet/configurer.rb:236:in `retrieve_new_catalog'
/opt/puppet/lib/ruby/site_ruby/1.8/puppet/configurer.rb:86:in `retrieve_catalog'
/opt/puppet/lib/ruby/site_ruby/1.8/puppet/configurer.rb:112:in `retrieve_and_apply_catalog'
/opt/puppet/lib/ruby/site_ruby/1.8/puppet/configurer.rb:151:in `run'
/opt/puppet/lib/ruby/site_ruby/1.8/puppet/agent.rb:39:in `run'
/opt/puppet/lib/ruby/site_ruby/1.8/puppet/agent/locker.rb:21:in `lock'
/opt/puppet/lib/ruby/site_ruby/1.8/puppet/agent.rb:39:in `run'
/opt/puppet/lib/ruby/1.8/sync.rb:230:in `synchronize'
/opt/puppet/lib/ruby/site_ruby/1.8/puppet/agent.rb:39:in `run'
/opt/puppet/lib/ruby/site_ruby/1.8/puppet/agent.rb:103:in `with_client'
/opt/puppet/lib/ruby/site_ruby/1.8/puppet/agent.rb:37:in `run'
/opt/puppet/lib/ruby/site_ruby/1.8/puppet/application.rb:172:in `call'
/opt/puppet/lib/ruby/site_ruby/1.8/puppet/application.rb:172:in `controlled_run'
/opt/puppet/lib/ruby/site_ruby/1.8/puppet/agent.rb:35:in `run'
/opt/puppet/lib/ruby/site_ruby/1.8/puppet/application/agent.rb:336:in `onetime'
/opt/puppet/lib/ruby/site_ruby/1.8/puppet/application/agent.rb:310:in `run_command'
/opt/puppet/lib/ruby/site_ruby/1.8/puppet/application.rb:306:in `run'
/opt/puppet/lib/ruby/site_ruby/1.8/puppet/application.rb:410:in `hook'
/opt/puppet/lib/ruby/site_ruby/1.8/puppet/application.rb:306:in `run'
/opt/puppet/lib/ruby/site_ruby/1.8/puppet/application.rb:401:in `exit_on_fail'
/opt/puppet/lib/ruby/site_ruby/1.8/puppet/application.rb:306:in `run'
/opt/puppet/lib/ruby/site_ruby/1.8/puppet/util/command_line.rb:69:in `execute'
/usr/local/bin/puppet:4
err: Could not retrieve catalog from remote server: execution expired
warning: Not using cache on failed catalog
err: Could not retrieve catalog; skipping run
debug: Value of 'preferred_serialization_format' (pson) is invalid for report, using default (b64_zlib_yaml)
debug: report supports formats: b64_zlib_yaml marshal raw yaml; using b64_zlib_yaml

#14 Updated by Andrew Parker over 1 year ago

  • Target version deleted (2.7.x)

Also available in: Atom PDF