The Puppet Labs Issue Tracker has Moved: https://tickets.puppetlabs.com

This issue tracker is now in read-only archive mode and automatic ticket export has been disabled. Redmine users will need to create a new JIRA account to file tickets using https://tickets.puppetlabs.com. See the following page for information on filing tickets with JIRA:

Feature #19708

puppet agent retry failed http requests

Added by Patrick Hemmer about 3 years ago. Updated almost 2 years ago.

Status:Needs DecisionStart date:
Priority:LowDue date:
Assignee:Charlie Sharpsteen% Done:

0%

Category:-
Target version:-
Affected Puppet version:3.1.0 Branch:
Keywords:customer

We've Moved!

Ticket tracking is now hosted in JIRA: https://tickets.puppetlabs.com

This ticket is now tracked at: https://tickets.puppetlabs.com/browse/PUP-2526


Description

It would be nice if puppet agent had the ability had the ability to retry failed http requests to the puppet master.

I have multiple puppet masters sitting behind a load balancer in AWS (ELB). Whenever we update a puppet master, the node is removed from the load balancer while the update occurs. Unfortunately the AWS ELB does not allow quiescent removal of nodes (such that existing connections are allowed to close gracefully). Instead it just severs the connections immediately. This causes errors for agents which are in the middle of making requests to that master.
Another related scenario is when you’re updating multiple puppet masters. The masters might be in the middle of updating, and so some masters have newer code than the others. A puppet agent gets a catalog from one master, which says a certain file should exist, but then when the agent goes to fetch that file, it fails because the master it tried to fetch from hasn’t updated. Retrying wouldn’t be an ideal solution for this scenario as a retry could just hit that same out-of-date master again, but it could possibly work. Yes the ideal solution here is session persistence, but the AWS ELB does not support it.

It might be useful to even allow a configurable backoff (failure; sleep 2; failure; sleep 5; failure; abort…), though a single retry would be sufficient for the first scenario indicated above. If a backoff is implemented, I think it should only be done once in the whole puppet run. So that if you have a 100 different http requests that have to be made to the puppet master, you don’t do the backoff wait thing 100+ times.

History

#1 Updated by Celia Cottle almost 3 years ago

  • Keywords set to customer

#3 Updated by Dan Achim almost 3 years ago

I have also observed this behaviour and would like to add a bit more information. I opened a bug and Lee Lowder mentioned that there isn’t a way (currently) to do what I want but there is this feature request.

I would like to add that it would be really useful if in a multi puppet master setup where DNS round-robin (or SRV records) are used to load balance the agents across the masters, the agents will move to the next master if the current one they are trying responded to the TCP port but returned a non-200 response code. Currently in this scenario, the agents will keep trying the same master even if it returns 5xx because they only consider a master as not usable if it is completely unresponsive (TCP port not accesible).

Could the agent also not only retry at a failed http request but also try the next master in list if it detected that DNS responded with multiple IPs?

#4 Updated by Charlie Sharpsteen almost 3 years ago

  • Assignee set to Charlie Sharpsteen

#5 Updated by Rob Nelson over 2 years ago

  • Status changed from Unreviewed to Accepted

Marking as accepted.

#6 Updated by Charlie Sharpsteen over 2 years ago

  • Status changed from Accepted to Needs Decision

This will need a decision. As noted, the root of the problem here is that the load balancer is not allowing the puppet masters to finish handling their requests before terminating the connections. In general, this sort of situation can indicate a serious problem and throwing a rug over it by retrying until success occurs may not be the best approach.

Eric, any thoughts on this one? It reminds me of the discussion that was had over retrying yum installs:

https://github.com/puppetlabs/puppet/pull/1691

#7 Updated by Peter Drake almost 2 years ago

Redmine Issue #19708 has been migrated to JIRA:

https://tickets.puppetlabs.com/browse/PUP-2526

Also available in: Atom PDF