The Puppet Labs Issue Tracker has Moved: https://tickets.puppetlabs.com

This issue tracker is now in read-only archive mode and automatic ticket export has been disabled. Redmine users will need to create a new JIRA account to file tickets using https://tickets.puppetlabs.com. See the following page for information on filing tickets with JIRA:

Feature #16187

Relationships should work between hosts

Added by Luke Kanies over 3 years ago. Updated over 3 years ago.

Status:InvestigatingStart date:08/30/2012
Priority:NormalDue date:
Assignee:eric sorenson% Done:

0%

Category:-
Target version:-
Affected Puppet version: Branch:
Keywords:backlog

We've Moved!

Ticket tracking is now hosted in JIRA: https://tickets.puppetlabs.com


Description

Currently, one can only specify a relationship within a given host’s graph. This means that the system can easily figure out, using dependencies, the order of work for a given node, but it cannot figure out the order of work across nodes. If a web service running on one host depends on a db service running on another, you can use exported resources to share configuration (albeit in non-ideal ways), but you cannot specify this dependency.

This means you cannot get a view of the true service dependencies in the system, and you always have a thin view of the world.

We need a way to specify that a given service or resource depends on a service or resource on another host.

This feature could be done in a very simple way, but could also get very sophisticated. For instance, I’ve got a prototype that blocks a host’s transaction until some remote resource is in the state you want: [[https://github.com/puppetlabs/puppetlabs-external_resource]]

external_resource { "file exists":
  frequency => 1,
  timeout => 30,
  check => "/bin/ls /tmp/myfile"
}

notify { "File exists": require => Remote_resource["file exists"] }

This allows you to cause one service to wait until dependent (and potentially remote) services are available. A very similar feature should exist for remote relationships: We need some way of ordering operations so that we do not try to bring the web server up before its database server is up.


Related issues

Related to Puppet - Feature #6603: Multi-node Application Management Needs More Information 03/04/2011

History

#1 Updated by Luke Kanies over 3 years ago

  • Description updated (diff)

#2 Updated by Luke Kanies over 3 years ago

  • Description updated (diff)

#3 Updated by eric sorenson over 3 years ago

  • Keywords set to backlog

#4 Updated by eric sorenson over 3 years ago

  • Tracker changed from High Level Feature to Feature
  • Project changed from Product Roadmap to Puppet

#5 Updated by Trevor Vaughan over 3 years ago

Since we now have a fun, high powered database, can I resurrect this thread to add to this?

http://www.mail-archive.com/puppet-users@googlegroups.com/msg08890.html

#6 Updated by Luke Kanies over 3 years ago

I think that kind of central state is a reasonable solution, but I think we can do much better. In particular, I think it’s important to do everything we can to support cycle-detection and avoid locking the system because of those cycles. This kind of state doesn’t help with that.

I instead want to build a single graph with the whole infrastructure in it, and then trigger hosts in the appropriate order based on that graph. Each individual system would then have the ability to directly test whether its dependencies are working, so that it could either go, hold, or fail depending on the state of required services, not the state as declared in a database.

I think your solution is a good start, though, and is clearly better than we have now.

#7 Updated by Trevor Vaughan over 3 years ago

I’m not so sure about the mega-graph. It feels like you’d have some serious memory/performance issues with that which would take a LOT longer to solve instead of using methods that are known to work for parallel problems.

I still look at this as a parallel programming problem.

Client == CPU Catalog == Thread Thus => semaphore, waitlock, mutex, etc…

As a bonus, if we start with this, it should be pretty darn easy with PuppetDB. I suppose you could also do something with a message bus but that’s more infrastructure for no real gain unless you wanted to maintain global state on all nodes at the same time based on the queues/topics that they were subscribed to.

#8 Updated by Luke Kanies over 3 years ago

The system-wide graph doesn’t have to be that large, because you don’t have every resource in it.

The major problem I have with full parallel processing is that it’s not actually parallel – you have some things (e.g., start db then web) that have to be serial, and some that don’t. How do you sort out the differences? How do you avoid lock-up?

I agree that some people would do fine with the semaphores, and I’m ok with the idea of supporting that in the beginning, but it’s not the long-term solution.

#9 Updated by Trevor Vaughan over 3 years ago

I’d be more than happy to try the semaphore version and then see where it goes. Kind of like extlookup grew into hiera.

I’m not quite sure how you’re going to do the cycle detection without compiling the catalog for all dependant nodes at the same time, especially since logic forks can be based on facts and what could be a cycle in one case, wouldn’t be in another.

When I first thought about it, I envisioned code blocks that just wouldn’t run until a particular case was enacted on the semaphore state. However, this does mean that it could take several hours for your environment to stabilize since it may take multiple runs on multiple servers.

I’m not really sure what a good solution for this is if hosts can’t be triggered to re-run by an external service in a transport agnostic manner.

#10 Updated by Luke Kanies over 3 years ago

You’d definitely have to compile the catalog for all dependent hosts, but you basically need to do that anyway. Either you’ve compiled it, and you know there’s a dependency and you can actually build your service, or you haven’t compiled it, so you don’t know the dependency exists or the service can’t build. You don’t need to do it all at the same time, but you at least have to have them all be recent enough.

It’s that “wait an infinite amount of time for the system to converge” that I can’t tolerate.

Nothing is perfect, but I think the graph has the best combination of function, simplicity and actually being possible.

#11 Updated by Anonymous over 3 years ago

  • Status changed from Unreviewed to Investigating
  • Assignee set to eric sorenson

Marking this as investigating since it looks like there is some discussion happening around how something like this might shake out. I’ll assign it to eric for now so that an eye can be kept on it.

Also available in: Atom PDF