The Puppet Labs Issue Tracker has Moved: https://tickets.puppetlabs.com

Bug #15062

puppet fails if template contains invalid utf-8

Added by Chris Price about 2 years ago. Updated 7 months ago.

Status:Needs DecisionStart date:06/15/2012
Priority:HighDue date:
Assignee:eric sorenson% Done:

0%

Category:templates
Target version:-
Affected Puppet version:2.7.16 Branch:
Keywords:character encoding binary utf8

We've Moved!

Ticket tracking is now hosted in JIRA: https://tickets.puppetlabs.com

This ticket is now tracked at: https://tickets.puppetlabs.com/browse/PUP-1038


Description

If you attempt to use a file resource with a ‘content’ parameter pointing at a template, and the template contains binary content, you may get an error like this:

Error: Failed to apply catalog: Parameter content failed: Munging failed for value ...
invalid byte sequence in UTF-8

I’ve reproduced the failure in 2.7.16 and 3.x, though the error messages differ slightly between the two (and also depending on whether you repro via ‘apply’ or via master/agent run).

I’m attaching the binary file that I’m using to repro. Save it into a directory structure like this:

modules/mymod/templates/mytemplate.erb

Add the “modules” directory to your module path and then you can repro with the following manifest:

file { "/tmp/myfile":
    mode => 755,
    content => template("mymod/mytemplate.erb"),
}

Note that if you use the ‘source’ parameter rather than the ‘content’ parameter (and avoid calling the template function), the manifest can be applied successfully; so the issue is when bringing in binary data as a string.

mytemplate.erb Magnifier (46.6 KB) Chris Price, 06/15/2012 10:31 am

initial_monmap (328 Bytes) Chris Price, 08/22/2012 01:20 pm


Related issues

Related to PuppetDB - Bug #14873: puppetdb-0.9 - checksums don't match Closed 06/07/2012
Related to Puppet - Bug #16061: Deprecate non-UTF-8 encodings for textual content Accepted 08/21/2012
Related to PuppetDB - Bug #15903: Binary fact data causes PuppetDB 400 checksums don't matc... Closed 08/09/2012
Related to Puppet - Bug #20522: Improve Puppet's handling of non-ASCII character encodings Accepted

History

#1 Updated by eric sorenson almost 2 years ago

  • Status changed from Unreviewed to Accepted
  • Assignee set to Deepak Giridharagopal

Deepak we had a utf8 discussion last week but I don’t remember the outcome — can you update this with what you feel the right thing should be?

#2 Updated by Deepak Giridharagopal almost 2 years ago

I think there’s a few options available, though we can basically only do these things in Telly (due to compat):

  • Just state that all template content and .pp files must be UTF-8, and fail if they don’t appear to be. We’ve got code in the PuppetDB terminus that could help with that

  • In addition to the above statement, provide a version of the template function (and perhaps other functions, like generate) that take the character encoding as an argument. So you could do “template("foo.erb”, “latin-1”) or something to tell us what the encoding is, and we use that to convert to UTF-8 internally

I’d be okay with just the first, personally, though that pushes the burden of character set conversion to the user (they’d have to use iconv on the command line or something).

Relevant mailing list thread about this, where I propose just enforcing UTF-8:

https://groups.google.com/forum/?fromgroups#!topic/puppet-users/-OhWwhdq2-U%5B1-25%5D

#3 Updated by Deepak Giridharagopal almost 2 years ago

  • Status changed from Accepted to Needs Decision
  • Assignee changed from Deepak Giridharagopal to eric sorenson

#4 Updated by Mariusz Gronczewski almost 2 years ago

Enforcing utf8 would break any binary data that is feed to “content” parameter, and it’s not only templates. For example we use content for passing crypto keys so no other node can read it, using source doesnt work very well because each client would have to have ACL for their content or else all clients can read anything from fileserver. Also some config files require to use specific encoding (or even binary command) in some parts of it, while rest of it (comments etc) is UTF-8 and it’s not always feasible to fix it (like some non-free software).

IMO it should pass data “as is” + add iconv function with optional “ignore invalid characters” option

#5 Updated by Chris Price almost 2 years ago

I’ve linked this to ticket #15903, which is related. In that case the issue was triggered by trying to submit a Facter fact that contained binary data, rather than a resource property. Also, the binary data that triggered that issue was of a slightly different form than the data previously reported on this ticket, so I’ve attached the file “initial_monmap” from that ticket. It contains the binary data in question. In particular, the character sequence [0xC0 0xA8] in that caused the problem.

#6 Updated by Andreas Knifh over 1 year ago

I’m running into this issue with puppet 3.1.0-1puppetlabs1 in debian wheezy using the generate function to install kerberos keytabs on my hosts. It’s a bit weird though, my puppet masters are running on debian squeeze, still puppet 3.1.0-1puppetlabs1 and any hosts running debian squeeze has no problems generating the keytabs. Only debian wheezy, and also ubuntu 12.10.

#7 Updated by Mathieu Arnold about 1 year ago

Hi,

I’ve been testing ruby 1.9.3 on a test server, and I’m getting this issue too.

Saying that “everything has to be UTF-8” is a nice thing, but in the real world, it’s not happening any time soon. I need to be able to distribute non UTF-8 files, I even have a couple of files containing delimited but unescaped binary values. Enforcing UTF-8 for .pp files should not, I think, harm anyone, even if I don’t see a real good reason for, but enforcing it for template files is IMHO a bad idea, puppet should see those files as a bunch of bytes, that it has to ensure is present, but anything more is a bad idea.

Regards,

#8 Updated by Josh Cooper about 1 year ago

Mathieu Arnold wrote:

puppet should see those files as a bunch of bytes

That won’t work for templates, because Puppet needs to read the bytes, convert to a ruby string, and invoke ERB.new(str,...). So an encoding needs to be specified somewhere. We could assume UTF-8, and/or provide a mechanism for the caller to specify the encoding, as Deepak suggested.

This isn’t an issue for the source parameter, because in that case we just perform a binary copy.

#9 Updated by Mariusz Gronczewski 10 months ago

At this point even having sensible default would help, as if for some reason environment isnt “right” puppet just fails with cryptic (for new user) message.

Could encoding just be made parameter to template ? like template(‘some/file.erb’,‘ISO-8859-2’) ? Then default could be UTF8 but few files needing specific encoding could be easily fixed

#10 Updated by Mathieu Arnold 9 months ago

Any news on that front ?

#11 Updated by Mathieu Arnold 9 months ago

  • Priority changed from Normal to High

#12 Updated by Erik Dalén 7 months ago

Redmine Issue #15062 has been migrated to JIRA:

https://tickets.puppetlabs.com/browse/PUP-1038

Also available in: Atom PDF