The Puppet Labs Issue Tracker has Moved: https://tickets.puppetlabs.com
https://tickets.puppetlabs.com. See the following page for information on filing tickets with JIRA:
Filebucket reads entire files into memory
|Affected Puppet version:||Branch:|
Ticket tracking is now hosted in JIRA: https://tickets.puppetlabs.com
During backup/restore, Puppet::FileBucket::Dipper uses IO.binread() with no length parameter to read the entire file resource in question into memory at one time. This is very simple, but it causes problems when Puppet is used to manage large files.
To compound the problem, Puppet::FileBucketFile.verify_identical_file! also reads any backed-up copy of the file fully into memory. Thus, in the worst case, Puppet might read two full copies of a large file into memory at one time.
This problem can result in Puppet daemon processes taking up large amounts of RAM. Even though this RAM may be reused by the Ruby interpreter, it does not seem to be released back to the OS.
I propose that it would be better to operate on the files in question in small chunks, only reading a couple kilobytes into memory at once. This is a common practice. For instance, Puppet::Util::Checksums.checksum_file reads the file in small chunks for performing the MD5.
#5 Updated by Anonymous almost 2 years ago
A portion of this was fixed as part of the work on #22918. The
verify_identical_file! method now compares the submitted contents with the current contents using streams to minimize the memory needed for that (it only keeps the submitted contents in memory). The next step after this is to allow use of streams or temp files all the way from the HTTP request to the filebucket code in order to minimize the memory use yet more.