The Puppet Labs Issue Tracker has Moved: https://tickets.puppetlabs.com

This issue tracker is now in read-only archive mode and automatic ticket export has been disabled. Redmine users will need to create a new JIRA account to file tickets using https://tickets.puppetlabs.com. See the following page for information on filing tickets with JIRA:

Feature #5783

support HTTP(S) URL as the file 'source'

Added by Anonymous over 5 years ago. Updated over 2 years ago.

Status:AcceptedStart date:01/05/2011
Priority:NormalDue date:
Assignee:-% Done:

0%

Category:fileservingEstimated time:8.00 hours
Target version:-
Affected Puppet version:development Branch:
Keywords:

We've Moved!

Ticket tracking is now hosted in JIRA: https://tickets.puppetlabs.com

This ticket is now tracked at: https://tickets.puppetlabs.com/browse/PUP-1072


Description

Lots and lots of folks want to be able to do this:

file { "/tmp/example.txt": source => 'http://example.com/example.txt' }

This would be good to support; obviously the metadata for the HTTP URI is much less available than via puppet file serving, but this would make a lot of people very, very happy.

History

#1 Updated by eric sorenson over 5 years ago

Daniel Pittman wrote:

This would be good to support; obviously the metadata for the HTTP URI is much less available than via puppet file serving, but this would make a lot of people very, very happy.

Actually HTTP semantics can largely support the metadata and it would be great if puppet implemented them. For instance: “If-Modified-Since” – can provide checksum=> timestamp “ETags” – can provide checksum => md5 – http://en.wikipedia.org/wiki/HTTP_ETag

#2 Updated by James Turnbull over 5 years ago

  • Status changed from Unreviewed to Needs Decision

#3 Updated by donavan m over 5 years ago

eric sorenson wrote:

Actually HTTP semantics can largely support the metadata and it would be great if puppet implemented them. For instance: “If-Modified-Since” – can provide checksum=> timestamp “ETags” – can provide checksum => md5 – http://en.wikipedia.org/wiki/HTTP_ETag

Yes. And the little used Content-MD5. And of course there’s plenty of room in the X- header space.

#4 Updated by Anonymous over 5 years ago

eric sorenson wrote:

Daniel Pittman wrote:

This would be good to support; obviously the metadata for the HTTP URI is much less available than via puppet file serving, but this would make a lot of people very, very happy.

Actually HTTP semantics can largely support the metadata and it would be great if puppet implemented them. For instance: “If-Modified-Since” – can provide checksum=> timestamp “ETags” – can provide checksum => md5 – http://en.wikipedia.org/wiki/HTTP_ETag

ETags are not an MD5, just an opaque identifier, but they could provide ‘checksum => etag’ facility.

I was assuming that the HTTP server was not the regular puppet server, but an unmodified Apache or whatever, so you couldn’t just assume that we can add custom metadata to the response. I think this will be true for many users of this, even if puppet natively supports “rich HTTP” requests and the like.

#5 Updated by James Turnbull over 5 years ago

  • Assignee set to Nigel Kersten

#6 Updated by Nigel Kersten over 5 years ago

I don’t believe we should implement this and have it not work well with a vanilla HTTP server.

I’ve talked to a fair few people about this in IRC over the years, and most have been satisfied with a simple function that accepts ftp:// http:// etc URIs for use with the content parameter. This does mean you’re shipping the file contents every time, but for a lot of people this seems to be an acceptable tradeoff.

How would we make this work with a vanilla third-party HTTP server?

#7 Updated by Anonymous over 5 years ago

Nigel Kersten wrote:

I don’t believe we should implement this and have it not work well with a vanilla HTTP server.

I’ve talked to a fair few people about this in IRC over the years, and most have been satisfied with a simple function that accepts ftp:// http:// etc URIs for use with the content parameter. This does mean you’re shipping the file contents every time, but for a lot of people this seems to be an acceptable tradeoff.

How would we make this work with a vanilla third-party HTTP server?

I was planning on poking this as my next “fun” project as I worked through learning the codebase; my plan was to implement “If-Modified-Since” and “ETag” support, use the local state store to track that, and see about just using very sparse metedata – default to the default user, group, and umask locally.

I am inclined to think we are best doing that, then letting feature requests drive extensions like metadata on non-puppet servers.

#8 Updated by Nigel Kersten over 5 years ago

I’m not seeing what the point of doing this work would be over improving the performance of the fileserver Daniel.

#9 Updated by Anonymous over 5 years ago

Nigel Kersten wrote:

I’m not seeing what the point of doing this work would be over improving the performance of the fileserver Daniel.

The main attraction is that getting an HTTP service is place is lower overhead than getting a full-blown puppet system in place; a secondary consideration is that it is easier to add dynamic content to it than through extending our file-server.

Using content is … OK, for reasonably sized files, but pretty nasty if you want to ship down a 1.2GB installer package for Oracle – as one person wanted to do with this. It would be very, IMO, to have the extra network efficiency here.

Finally, at least some users (including me) have wanted this because the network between the puppet master and client is thin, but the data is available elsewhere – in our case, at least some was inside a database system that was HTTP accessible from the client, so the ‘content’ proxy would require coping that from the DC to our master, then back to the DC.

On the other hand, perhaps the use of content is an acceptable solution for now, as it can be extended to transparently perform the network optimization in future without needing to do extra things…

#10 Updated by Felix Frank over 5 years ago

Nigel Kersten wrote:

I’m not seeing what the point of doing this work would be over improving the performance of the fileserver Daniel.

There are tons of use cases, seeing how ubiquitous HTTP is these days (e.g., most package repositories for Linux etc.)

The bulk files argument of Daniel’s is a good one. Amending to this: Many people probably have their puppetmaster fileserver trees under version control. While I’m not keen on puppet serving 1GB installers, I don’t want even as large as 40MB tarballs to be part of my git history. Looking at Java, Tomcat etc., it would be most reasonable to be able to have puppet fetch their respective sources via HTTP.

#11 Updated by Brian Gallew over 5 years ago

Another (similar) motivation is that right now I’ve got a lot of file in my Puppet SVN repository that I’ve pulled in from other repositories simply because Puppet can’t deploy files from an HTTPS SVN location.

#12 Updated by Nigel Kersten over 5 years ago

I obviously haven’t been clear.

I’m not against having something that works with vanilla HTTP, precisely because it is so ubiquitous.

That doesn’t seem to be what we’re talking about here though.

“but an unmodified Apache or whatever, so you couldn’t just assume that we can add custom metadata to the response.”

If we can make it work with an unmodified “Apache or whatever”, then I think it’s worth doing. If we can’t…

#13 Updated by Anonymous over 5 years ago

Nigel Kersten wrote:

I’m not against having something that works with vanilla HTTP, precisely because it is so ubiquitous. That doesn’t seem to be what we’re talking about here though.

Oh. I see no problems doing vanilla HTTP; any “value add” custom metadata would be something that I didn’t see as part of the core. So, you and I are in violent agreement.

#14 Updated by Nigel Kersten over 4 years ago

  • Status changed from Needs Decision to Accepted
  • Assignee deleted (Nigel Kersten)

#15 Updated by James Loope over 4 years ago

It would be ideal if this had the capacity to check the etag against an md5 of the existing file on disk

#16 Updated by Steve Shipway over 4 years ago

I’d be happy if puppet simply used sensible defaults for http: URLs where no explicit action was defined and the metatdata was not available.

eg, no metadata for file owner/group/mode? Default to puppet process owner/group and mode 0644. Need to identify if file has changed? Do a ‘Head’ request first and use the Etag or Content-md5 headers, if they are provided by the HTTP server. If not, use Content-Length or Last-Modified. If the http server is so dumb as to not provide any metadata at all, then download the file and compare locally. Most people will have apache 2.x or a server of similar ability, so this is a fair assumption.

#17 Updated by Srikrishna Das over 4 years ago

Any updates on this? :)

We would love to have our “puppet” using customers to download scripts/codesnips from our downloads page directly using the file resource type.

#18 Updated by Bill Fehring over 4 years ago

If the http server doesn’t support any kind of metadata, one additional potential safety mechanism might be to allow the expected hash to be specified in the file declaration so that puppet can at least compare the file it already has with something before trying to get it again.

file { "/tmp/example.txt": source => 'http://example.com/example.txt', sha1 => '938aa7a9b80408cc1a61a0ecfe36b5633885ec21' }

Of course if the file doesn’t match it will just keep getting downloaded over and over again until it does, but that’s a price I’d personally be willing to pay.

One of the key things that this feature enables is for me to place some larger files on a CDN.

#19 Updated by Ben Hughes over 4 years ago

  • Description updated (diff)
  • Status changed from Accepted to Unreviewed

#20 Updated by Michael Stahnke over 4 years ago

  • Description updated (diff)
  • Status changed from Unreviewed to Accepted
  • Assignee set to Anonymous

Not sure exactly why this dropped into unreviewed.

#21 Updated by Anonymous over 4 years ago

Michael Stahnke wrote:

Not sure exactly why this dropped into unreviewed.

Me either. For the audience at home, we would likely accept a patch to this, but don’t have any immediate plans to work on it. If you are interested, please discuss here the exact implementation you are thinking of, so we can be sure that you build something we can absorb.

#22 Updated by Rhys Morgan over 4 years ago

I am working on something similar for a project I am working on using an external download management tool. If anyone is interested in this I will happily share.

#23 Updated by Josh Cooper over 4 years ago

It would be helpful to reuse the existing http client code Puppet::Network::HttpPool.http_instance so that the SSL context is setup correctly, http proxy settings applied, timeouts, etc. See also #8465

#24 Updated by Anonymous over 4 years ago

Rhys Morgan wrote:

I am working on something similar for a project I am working on using an external download management tool. If anyone is interested in this I will happily share.

We would absolutely take patches for this. Ultimately, the hard part isn’t the HTTP side – as Josh notes, we already have some good setup stuff for HTTP, and Ruby makes getting content that way pretty easy.

The hard part is actually integrating this into the file type, making decisions about how to deal with “has this changed”, and that sort of thing. So, absolutely, if you have answered some of those questions and want to contribute code, or notes on how to map the Puppet and HTTP semantics, that would be awesome.

#25 Updated by Rhys Morgan about 4 years ago

The implementation I am using makes it relatively easy to handle that behaviour, we are using the aria2 download manager which has an xml roc interface and allows us to pause/resume, throttle, download from multiple sources and also external sources. We are working in a >100k managed endpoint environment.

So far what we have done is:

  • Added a terminus for our custom prefix to make an xml-rpc call to aria2 to add a URI and download destination

  • Rather than modify the file type we have inserted some conditional checking to see if the file resource in question has already been actioned on the endpoint which modifies the default puppet behaviour in that in our scenario a download would more than likely span multiple puppet runs.

  • Created a custom fact which queries the status of the files that are downloading and reports status/progress/eta using the xml-rpc interface.

  • We will also modify our reports parser to allow this return to be displayed at the visualisation layer as a pending status rather than a failed run.

We are still in a design/PoC phase of this but once it becomes more mature in direction I will happily share the code changes.

Happy to hear anyones views on how this could be done in a better way

#26 Updated by Juan Pablo Daniel Borgna about 3 years ago

Please, refer to my update on this other bug, it seems related http://projects.puppetlabs.com/issues/3365

Saludos!

#27 Updated by eric sorenson about 3 years ago

@ JUAN PABLO DANIEL BORGNA – I do not understand how this is related to the other update you made.

#28 Updated by Anonymous almost 3 years ago

  • Assignee deleted (Anonymous)

#29 Updated by Ian Ward over 2 years ago

What about something like zsync? http://zsync.moria.org.uk/paper/index.html

It looks like it requires no special configuration of the HTTP server, but has need of a .zsync file on the server.

Ubuntu apparently provides/provided this as method to update installation files: http://manpages.ubuntu.com/manpages/precise/man1/zsync.1.html http://lifehacker.com/5393555/use-zsync-to-upgrade-an-ubuntu-installation-image

The end user would just need to run zsyncmake on the files when they’re changed (or have their revision control system do it for them in a post-commit hook).

#30 Updated by Jason Antman over 2 years ago

Redmine Issue #5783 has been migrated to JIRA:

https://tickets.puppetlabs.com/browse/PUP-1072

Also available in: Atom PDF