The Puppet Labs Issue Tracker has Moved:

This issue tracker is now in read-only archive mode and automatic ticket export has been disabled. Redmine users will need to create a new JIRA account to file tickets using See the following page for information on filing tickets with JIRA:

Bug #13097

Filebucket reads entire files into memory

Added by Evan Mezeske about 4 years ago. Updated over 2 years ago.

Status:AcceptedStart date:03/13/2012
Priority:NormalDue date:
Assignee:-% Done:


Target version:-
Affected Puppet version: Branch:

We've Moved!

Ticket tracking is now hosted in JIRA:


During backup/restore, Puppet::FileBucket::Dipper uses IO.binread() with no length parameter to read the entire file resource in question into memory at one time. This is very simple, but it causes problems when Puppet is used to manage large files.

To compound the problem, Puppet::FileBucketFile.verify_identical_file! also reads any backed-up copy of the file fully into memory. Thus, in the worst case, Puppet might read two full copies of a large file into memory at one time.

This problem can result in Puppet daemon processes taking up large amounts of RAM. Even though this RAM may be reused by the Ruby interpreter, it does not seem to be released back to the OS.

I propose that it would be better to operate on the files in question in small chunks, only reading a couple kilobytes into memory at once. This is a common practice. For instance, Puppet::Util::Checksums.checksum_file reads the file in small chunks for performing the MD5.

Related issues

Related to Puppet - Bug #22375: (#8229) File bucket and Puppet File resource: fails with ... Closed 07/04/2011
Related to Puppet - Feature #3371: FileBucket should not keep files in memory Accepted 03/15/2010
Duplicated by Puppet - Bug #18114: Recursive filebucket backup consumes way too much memory Duplicate


#1 Updated by Evan Mezeske about 4 years ago

An example of IO.binread() being used to read a whole file into memory:

#2 Updated by Patrick Carlisle about 4 years ago

  • Status changed from Unreviewed to Accepted
  • Assignee set to Anonymous

#3 Updated by Patrick Carlisle about 4 years ago

Thanks, this is definitely something we’d like to fix.

#4 Updated by Anonymous almost 3 years ago

  • Assignee deleted (Anonymous)

#5 Updated by Anonymous over 2 years ago

A portion of this was fixed as part of the work on #22918. The verify_identical_file! method now compares the submitted contents with the current contents using streams to minimize the memory needed for that (it only keeps the submitted contents in memory). The next step after this is to allow use of streams or temp files all the way from the HTTP request to the filebucket code in order to minimize the memory use yet more.

Also available in: Atom PDF