The Puppet Labs Issue Tracker has Moved: https://tickets.puppetlabs.com

Bug #3088

Puppetd fails to stop after receiving SIGTERM

Added by Josh Anderson almost 5 years ago. Updated almost 5 years ago.

Status:ClosedStart date:01/18/2010
Priority:NormalDue date:
Assignee:-% Done:

0%

Category:plumbingEstimated time:1.00 hour
Target version:0.25.4
Affected Puppet version:0.25.2 Branch:http://github.com/MarkusQ/puppet/tree/ticket/0.25.x/3088
Keywords:

We've Moved!

Ticket tracking is now hosted in JIRA: https://tickets.puppetlabs.com

This issue is currently not available for export. If you are experiencing the issue described below, please file a new ticket in JIRA. Once a new ticket has been created, please add a link to it that points back to this Redmine ticket.


Description

This is a weird issue that I encountered while lab testing 0.25.2.

My setup: ruby 1.8.7p160, puppet 0.25.2, Solaris 10 SPARC and x86

Here’s what happens: 1. If puppetd receives a SIGTERM during a config run, it fails to stop completely. (Explanation below.) 2. If puppetd receives a SIGTERM when it’s not in the middle of a run, it stops normally.

What does failing to stop mean? It logs that it’s stopping, closes all logfiles, and then hangs. Truss says that it’s poll()ing repeatedly.

To see exactly what happened, I inserted some not-very-clever tracing code into daemon.stop:

    # Stop everything
    def stop(args = {:exit => true})
        if agent
            set_trace_func Proc.new { |event, file, line, id, binding, classname|
                mesg = "[%8s] %30s %30s (%s:%-2d)\n" % [event, id, classname, file, line]
                File.open('/var/tmp/trace.txt', 'a') { |f| f.write(mesg) }
            }
        end

The rather lengthy trace output for a failed stop ends with:

[  return]                     __signal__                  SignalEmitter (/opt/ruby/lib/ruby/site_ruby/1.8/puppet/external/event-loop/signal-system.rb:98)
[  return]                         signal                  SignalEmitter ((eval):2 )
[    line]                         select                      EventLoop (/opt/ruby/lib/ruby/site_ruby/1.8/puppet/external/event-loop/event-loop.rb:126)
[    call]                      sleeping!                      EventLoop ((eval):1 )
[    line]                      sleeping!                      EventLoop ((eval):2 )
[  return]                      sleeping!                      EventLoop ((eval):2 )
[    line]                         select                      EventLoop (/opt/ruby/lib/ruby/site_ruby/1.8/puppet/external/event-loop/event-loop.rb:127)
[  c-call]                              +                          Array (/opt/ruby/lib/ruby/site_ruby/1.8/puppet/external/event-loop/event-loop.rb:127)
[c-return]                              +                          Array (/opt/ruby/lib/ruby/site_ruby/1.8/puppet/external/event-loop/event-loop.rb:127)
[  c-call]                         select                             IO (/opt/ruby/lib/ruby/site_ruby/1.8/puppet/external/event-loop/event-loop.rb:127)
I've attached log files and trace output for both successful and failed stops.

failed_stop_trace.txt Magnifier (3.17 MB) Josh Anderson, 01/19/2010 08:36 pm

successful_stop_puppet.log (17.1 KB) Josh Anderson, 01/19/2010 08:36 pm

successful_stop_smf.log (2.87 KB) Josh Anderson, 01/19/2010 08:36 pm

successful_stop_trace.txt Magnifier (55.7 KB) Josh Anderson, 01/19/2010 08:36 pm

failed_stop_puppet.log (16.8 KB) Josh Anderson, 01/19/2010 08:36 pm

failed_stop_smf.log (2.79 KB) Josh Anderson, 01/19/2010 08:36 pm

History

#1 Updated by Josh Anderson almost 5 years ago

  • Estimated time set to 1.00

This problem was caused by changes to agent.rb introduced to fix Issue #2661. See the diff “here”:http://projects.reductivelabs.com/projects/puppet/repository/revisions/adc0a4ed939a717e8735485d493bde28ceab5ac0/diff/lib/puppet/agent.rb .

As far as I can tell, what happens is that the modified rescue clause catches SystemExit and it doesn’t make its way up the stack like it should. Therefore, the client stops running but doesn’t actually exit.

You should be able to duplicate this by sending puppetd a SIGTERM in the middle of a configuration run. This is potentially an issue for anyone who uses a package manager (or other automated process) to deploy new versions of Puppet.

Adding a second rescue clause to each function which catches and re-raises the SystemExit has fixed this for me.

#2 Updated by Markus Roberts almost 5 years ago

  • Status changed from Unreviewed to Investigating
  • Target version set to 0.25.4

#3 Updated by Mark Plaksin almost 5 years ago

We’re seeing something similar with 0.25.4rc2. We have a cron job that restarts Puppet once a day. On Linux where we’re using the init script included with Puppet itself, restart says this: “Could not create PID file: /var/puppet/run/puppetd.pid” This sometimes happens when I run ‘/etc/init.d/puppet restart’ on the command line.

Here’s some sample craziness: Starting puppet: [ OK ] sock_0:~ # /etc/init.d/puppet restart Stopping puppet: [FAILED] Starting puppet: Could not prepare for execution: Could not create PID file: /var/puppet/run/puppetd.pid

                                                       [  OK  ]

On HP-UX and Solaris we have our own init scripts. They have various amount of sleep time in them between stopping Puppet and trying to start it again. We added the sleeps way back when because puppetd often took a while to go away after we told it to stop. It looks like the same thing is happening on Linux now.

With 0.25.4rc2 on HP-UX, there are times when puppetd never exits after being sent a TERM signal.

#4 Updated by Markus Roberts almost 5 years ago

  • Status changed from Investigating to In Topic Branch Pending Review
  • Branch set to http://github.com/MarkusQ/puppet/tree/ticket/0.25.x/3088

Patch up for testing, also including a number of smaller scope rescue blocks that (as a race condition) could exhibit the same behavior and for a select subset of the Exceptions that are not StandardErrors (NoMemoryError, SignalException, and Interrupt).

#5 Updated by Mark Plaksin almost 5 years ago

Ignore my comments about Linux. I just needed to put PIDFILE=… into /etc/sysconfig/puppet. We’ll test your branch on Solaris and HP-UX today.

#6 Updated by Mark Plaksin almost 5 years ago

It works great for us on Solaris and HP-UX. All of the strangeness we saw on HP-UX is gone. Yay!

#7 Updated by James Turnbull almost 5 years ago

  • Category set to plumbing
  • Status changed from In Topic Branch Pending Review to Closed

Pushed in commit:a91c476387887baa5920f5539a7c4acfaf8cecd9 in branch 0.25.x

Also available in: Atom PDF