The Puppet Labs Issue Tracker has Moved: https://tickets.puppetlabs.com

Bug #13097

Filebucket reads entire files into memory

Added by Evan Mezeske about 2 years ago. Updated 6 months ago.

Status:AcceptedStart date:03/13/2012
Priority:NormalDue date:
Assignee:-% Done:

0%

Category:filebucket
Target version:-
Affected Puppet version: Branch:
Keywords:

We've Moved!

Ticket tracking is now hosted in JIRA: https://tickets.puppetlabs.com

This ticket may be automatically exported to the PUP project on JIRA using the button below:


Description

During backup/restore, Puppet::FileBucket::Dipper uses IO.binread() with no length parameter to read the entire file resource in question into memory at one time. This is very simple, but it causes problems when Puppet is used to manage large files.

To compound the problem, Puppet::FileBucketFile.verify_identical_file! also reads any backed-up copy of the file fully into memory. Thus, in the worst case, Puppet might read two full copies of a large file into memory at one time.

This problem can result in Puppet daemon processes taking up large amounts of RAM. Even though this RAM may be reused by the Ruby interpreter, it does not seem to be released back to the OS.

I propose that it would be better to operate on the files in question in small chunks, only reading a couple kilobytes into memory at once. This is a common practice. For instance, Puppet::Util::Checksums.checksum_file reads the file in small chunks for performing the MD5.


Related issues

Related to Puppet - Bug #22375: (#8229) File bucket and Puppet File resource: fails with ... Closed 07/04/2011
Related to Puppet - Feature #3371: FileBucket should not keep files in memory Accepted 03/15/2010
Duplicated by Puppet - Bug #18114: Recursive filebucket backup consumes way too much memory Duplicate

History

#1 Updated by Evan Mezeske about 2 years ago

An example of IO.binread() being used to read a whole file into memory:

https://github.com/puppetlabs/puppet/blob/master/lib/puppet/file_bucket/dipper.rb#L34

#2 Updated by Patrick Carlisle about 2 years ago

  • Status changed from Unreviewed to Accepted
  • Assignee set to Daniel Pittman

#3 Updated by Patrick Carlisle about 2 years ago

Thanks, this is definitely something we’d like to fix.

#4 Updated by Daniel Pittman 11 months ago

  • Assignee deleted (Daniel Pittman)

#5 Updated by Andrew Parker 6 months ago

A portion of this was fixed as part of the work on #22918. The verify_identical_file! method now compares the submitted contents with the current contents using streams to minimize the memory needed for that (it only keeps the submitted contents in memory). The next step after this is to allow use of streams or temp files all the way from the HTTP request to the filebucket code in order to minimize the memory use yet more.

Also available in: Atom PDF