Bug #4462

Uncaught timeout exception

Added by Peter Meier over 1 year ago. Updated over 1 year ago.

Status:Closed Start date:08/03/2010
Priority:Normal Due date:
Assignee:- % Done:

0%

Category:-
Target version:2.6.1
Affected Puppet version:2.6.1rc1 Branch:http://github.com/jes5199/puppet/tree/ticket/2.6.x/4462
Keywords:
Votes: 1

Description

I encounter from time to time an uncaught timeout exception:

/usr/lib/ruby/1.8/timeout.rb:60:in `open': execution expired (Timeout::Error)
    from /usr/lib/ruby/1.8/net/http.rb:560:in `connect'
    from /usr/lib/ruby/1.8/net/http.rb:560:in `connect'
    from /usr/lib/ruby/1.8/net/http.rb:553:in `do_start'
    from /usr/lib/ruby/1.8/net/http.rb:542:in `start'
    from /usr/lib/ruby/1.8/net/http.rb:1035:in `request'
    from /usr/lib/ruby/1.8/net/http.rb:772:in `get'
    from /usr/lib/ruby/site_ruby/1.8/puppet/indirector/rest.rb:71:in `find'
    from /usr/lib/ruby/site_ruby/1.8/puppet/indirector/indirection.rb:193:in `find'
     ... 42 levels...
    from /usr/lib/ruby/site_ruby/1.8/puppet/application.rb:300:in `run'
    from /usr/lib/ruby/site_ruby/1.8/puppet/application.rb:397:in `exit_on_fail'
    from /usr/lib/ruby/site_ruby/1.8/puppet/application.rb:300:in `run'
    from /usr/sbin/puppetd:4

I think this one should be caught and translated into a puppet error?!


Related issues

related to Puppet - Bug #4704: puppet run fails completely if source of a file is missing Rejected 09/03/2010

History

Updated by Jesse Wolfe over 1 year ago

  • Status changed from Unreviewed to In Topic Branch Pending Review
  • Target version set to 2.6.1
  • Branch set to http://github.com/jes5199/puppet/tree/ticket/2.6.x/4462

Updated by Markus Roberts over 1 year ago

  • Status changed from In Topic Branch Pending Review to Ready For Checkin

Updated by Peter Meier over 1 year ago

I tested it and puppet now doesn’t anymore with a stacktrace, but it still exits directly after printing the error. Wouldn’t it be better to just fail the resource on which the timeout appears as it is done at some other places.

Updated by Peter Meier over 1 year ago

Peter Meier wrote:

I tested it and puppet now doesn’t anymore with a stacktrace, but it still exits directly after printing the error. Wouldn’t it be better to just fail the resource on which the timeout appears as it is done at some other places.

Another note on that: This seems to happen mainly during the creation of the internal types, so when the catalog is being parsed and the individual resource type objects are being created and puppet is for example asking the master for metadata about a file.

This means that it is definitely not the same as a usual timeout during fetching the source of a file resource, so failing the resource and anything else depending on it doesn’t yet seem to be possible.

However, still puppet usually seems to be stalled for a very long time if that error happens and it then prints out for example:

[...]
info: Caching catalog for foo.bar.ch
err: Could not run Puppet configuration client: Connection timed out Could not retrieve file metadata for puppet:///modules/apache/vhosts.d/CentOS.Final/0-default.conf: Connection timed out at /srv/puppet/development/modules/public/apache/manifests/vhost/file.pp:72
#

If I hook tcpdump on the master in front of nginx I can see that ningx sent something to the client. However the client remains silent about that and using strace I can see that the client is waiting for an answer and is then running into the http timeout after a very long time (more than the usual configtimeout and the usual 60s).

It occures to me that this behavior has been somehow introduced into 2.6 as I only remember resource timeouts (like fetching a file’s source) before 2.6. Furthermore, this is getting a bit annoying as I have a lot of clients on rather high latency networks and if they exit after that timeout the whole run is aborted. :/

But as I also upgraded other things (such as client’s/master’s ruby version or the different gems on the master) when upgrading puppet to 2.6, so the problem might be also somewhere else and debugging is therefore a bit difficult. Anyway, I will dig into that problem a bit further and my next step would be to dump the traffic on both sites, so I can see if really a package is getting lost or ruby/puppet-client is stalled somewhere else. If there are any other ideas around how to track things down I would be happy for further suggestions.

So besides looking for the cause for that timeout, I think puppet should somehow not abort its run, when such a timeout happens.

Updated by James Turnbull over 1 year ago

  • Status changed from Ready For Checkin to Code Insufficient

This needs a rethink based on Peter’s comments.

Updated by Peter Meier over 1 year ago

James Turnbull wrote:

This needs a rethink based on Peter’s comments.

I could get rid off most of the timeouts symptoms by introducing the fair proxy scheduler as described in nginx – mongrel – fair proxy balancer. However I still think that the run shouldn’t be aborted.

Updated by Markus Roberts over 1 year ago

  • Status changed from Code Insufficient to Needs Decision

My understanding is that this isn’t a timeout on a resource, it’s a timeout on communicating with the puppetmaster (e.g. getting the catalog).

Updated by Peter Meier over 1 year ago

Markus Roberts wrote:

My understanding is that this isn’t a timeout on a resource, it’s a timeout on communicating with the puppetmaster (e.g. getting the catalog).

No, it’s also timeout when getting metadata such as file-checksums in the pre-apply stage. This is related to the questions I asked in this thread: http://groups.google.com/group/puppet-dev/browse_thread/thread/1c8ac2c2d6fab46

Updated by Peter Meier over 1 year ago

Btw: I also filed #4704 and I think this one is related to that one. But I failed it as a separate bug as it’s a different cause (timeout vs. non-existant source). If you think they have both the same root-cause and should be fixed in the same way you can close one as duplicate.

Updated by James Turnbull over 1 year ago

  • Subject changed from uncaught timeout excpetion to Uncaught timeout exception

Updated by Markus Roberts over 1 year ago

  • Status changed from Needs Decision to Ready For Checkin

The patch on this ticket appears to fix the reported problem but, as Peter notes, that just reveals #4704, which also needs to be fixed.

Consequently, I’m setting this ticket back to “Ready for Checkin” and #4704 to Accepted.

Updated by James Turnbull over 1 year ago

  • Status changed from Ready For Checkin to Closed

Pushed in commit:e91a8cc975216501f764f5f2dea40d72154dc426 in branch 2.6.x

Also available in: Atom PDF