The Puppet Labs Issue Tracker has Moved: https://tickets.puppetlabs.com

This issue tracker is now in read-only archive mode and automatic ticket export has been disabled. Redmine users will need to create a new JIRA account to file tickets using https://tickets.puppetlabs.com. See the following page for information on filing tickets with JIRA:

Bug #19766

insert or update on table "certname_catalogs" violates foreign key constraint "certname_catalogs_catalog_fkey"

Added by Ken Barber about 3 years ago. Updated over 2 years ago.

Status:Needs More InformationStart date:
Priority:NormalDue date:
Assignee:-% Done:

0%

Category:-
Target version:-
Keywords: Affected PuppetDB version:1.1.1
Branch:

We've Moved!

Ticket tracking is now hosted in JIRA: https://tickets.puppetlabs.com

This ticket is now tracked at: https://tickets.puppetlabs.com/browse/PDB-176


Description

The user ‘ak0ska’ has been seeing the following errors in his postgresql logs:

[2013-03-12 05:51:19 CET]ERROR:  insert or update on table "certname_catalogs" violates foreign key constraint "certname_catalogs_catalog_fkey"
[2013-03-12 05:51:19 CET]DETAIL:  Key (catalog)=(c93825c7157cba8c10641b54d9dce17eb0ce618b) is not present in table "catalogs".
[2013-03-12 05:51:19 CET]STATEMENT:  INSERT INTO certname_catalogs (certname,catalog,timestamp) VALUES ($1,$2,$3) RETURNING *
[2013-03-12 05:51:19 CET]ERROR:  insert or update on table "certname_catalogs" violates foreign key constraint "certname_catalogs_catalog_fkey"
[2013-03-12 05:51:19 CET]DETAIL:  Key (catalog)=(dd1ad1b27c063beab9beaded80f449a050fe11a6) is not present in table "catalogs".
[2013-03-12 05:51:19 CET]STATEMENT:  INSERT INTO certname_catalogs (certname,catalog,timestamp) VALUES ($1,$2,$3) RETURNING *

Although up until now we can’t reproduce it directly with PuppetDB yet, we do believe the following SQL if for some reason was able to be reproduced in production would recreate this scenario:

The test:
 
# create the precondition, while it is difficult to catch it is the only pre-condition yet that reproduces this issue.
insert into catalogs values ('foo', 1, 1111);
 
# on psql shell 1 replicate a database gc:
begin;
DELETE FROM catalogs WHERE NOT EXISTS (SELECT * FROM certname_catalogs cc WHERE cc.catalog=catalogs.hash);
# While this query below is part of the full problem, it is not the thing creating the lock ... if ran in isolation this problem isn't reproduced. Also - if it is ommitted - the problem still happens.
DELETE FROM resource_params WHERE NOT EXISTS (SELECT * FROM catalog_resources cr WHERE cr.resource=resource_params.resource);
 
# then on psql shell 2 replicate the update of a catalog to new revision (cut and paste it in):
begin;
SELECT 1 FROM certnames WHERE name='puppetdb1.vm' LIMIT 1;
UPDATE certnames SET deactivated=NULL WHERE name='puppetdb1.vm' AND (deactivated<'2013-03-13 12:59:53.762' OR deactivated IS NULL);
SELECT timestamp FROM certname_catalogs WHERE certname='puppetdb1.vm' ORDER BY timestamp DESC LIMIT 1;
SELECT 1 FROM catalogs WHERE hash='foo';
UPDATE catalogs SET api_version=1, catalog_version='1363208780' WHERE hash='foo';
# it will block here ... so just skip ahead to the psql1 part
DELETE FROM certname_catalogs WHERE certname='puppetdb1.vm';
INSERT INTO certname_catalogs (certname,catalog) VALUES ('puppetdb1.vm','foo');
commit;
 
# on psql shell 1 finish the original delete transaction
commit;
 
should get error "ERROR:  insert or update on table "certname_catalogs" violates foreign key constraint "certname_catalogs_catalog_fkey""

So the net result after the constraint error is that the replace catalog request goes back into the queue, tries again and we see no data loss. The concern is that those requests block until the GC is complete, so ideally we should try to minimize this.

The original google groups discussion is here: https://groups.google.com/forum/?fromgroups=#!topic/puppet-users/yRMI1J_y6Ps

History

#1 Updated by Ken Barber about 3 years ago

  • Description updated (diff)

#2 Updated by Ken Barber about 3 years ago

  • Status changed from Unreviewed to Needs More Information

I’ve asked the community to seek out other instances of this problem: https://groups.google.com/forum/?fromgroups=#!topic/puppet-users/KSBexFMLyWM.

Also, I’m waiting to see if I can get statement logs from ak0ska to further understand the issue. As yet, even with benchmark pumping 2500 nodes with a 1 minute setting into my local PuppetDB I haven’t seen the exact same issue so its hard to determine exactly how to reproduce the precondition that causes this problem.

#3 Updated by Ken Barber about 3 years ago

  • Assignee deleted (Ken Barber)

#4 Updated by Charlie Sharpsteen over 2 years ago

Redmine Issue #19766 has been migrated to JIRA:

https://tickets.puppetlabs.com/browse/PDB-176

Also available in: Atom PDF