The Puppet Labs Issue Tracker has Moved: https://tickets.puppetlabs.com
role prefetching needs to be implemented to be more scalable
|Status:||Merged - Pending Release||Start date:||05/10/2012|
The roles prefetching does a cartesian product of users × tenants to fetch roles, it quickly becomes very slow as the number of users and tenants increases
In : https://github.com/puppetlabs/puppetlabs-keystone/blob/9a74a4dbb983544bf17a6ab35dd9188c47178224/lib/puppet/provider/keystone_user_role/keystone.rb#L109-127
The current loop is :
get all users get all tenants get the role for the user/tenant But users are rarely in more than one tenant.
A first optimization would be to :
get all tenants
get users in the tenant (keystone user-list
#2 Updated by François Charlier almost 3 years ago
Proposed fix in https://github.com/puppetlabs/puppetlabs-keystone/pull/49 On my dev env with :
- 188 users
- 173 tenants
- 1 user per tenant in general, less than 10 tenants with two users, 3 tenants with 3-8 users
Before the change, I stopped the
puppet agent process after 2 hours, it was still running …
After the change, the
puppet agent takes only 7 minutes to run.
7 minutes is still slow, but it’s better.
The fact is that each user/tenant/role/endpoint/… asked from keystone, the keystone process is ran once and makes two requests to the keystone server (one to authenticate, the second to get the data). It could (IMHO) again be improved using the REST API directly from ruby.
I’ll try to crunch some numbers to evaluate if it would be interesting to do so.
#3 Updated by François Charlier almost 3 years ago
To have an idea of the amount of optimization left, I wrote to example scripts to list all tenants/users/roles (the scripts https://gist.github.com/2918442)
The shell script is roughtly how the current first optimization of the module behaves. The python script is an idea of what could be achieved using the Keystone REST API directly.
With the same number of tenants/users as said above:
The shell script runs in ~ 3m20s on my machine (1m15s on the prod server)
The python script runs in ~ 30s on my machine (20s on our prod server), ~ 3.5 to 6.5 times faster.