-
Notifications
You must be signed in to change notification settings - Fork 168
Avoid deadlock when creating ready execution #229
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thanks @jk-es335, and sorry for the delay! Looking into this one now. |
@@ -37,7 +37,7 @@ def lock_candidates(job_ids, process_id) | |||
return [] if job_ids.none? | |||
|
|||
SolidQueue::ClaimedExecution.claiming(job_ids, process_id) do |claimed| | |||
where(job_id: claimed.pluck(:job_id)).delete_all | |||
where(id: where(job_id: claimed.pluck(:job_id)).pluck(:id)).delete_all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only problem with this one is that it introduces another query in the hot path... it's probably ok, but I think we could save it because we've already done a query to ready_executions
before, where we select only the job_id
. We could select id, job_id
, and then filter in memory for the ones that were actually claimed. Let me see how it'd look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what I had in mind:
diff --git a/app/models/solid_queue/claimed_execution.rb b/app/models/solid_queue/claimed_execution.rb
index f2ed58d..c67c62a 100644
--- a/app/models/solid_queue/claimed_execution.rb
+++ b/app/models/solid_queue/claimed_execution.rb
@@ -15,7 +15,7 @@ class SolidQueue::ClaimedExecution < SolidQueue::Execution
insert_all!(job_data)
where(job_id: job_ids, process_id: process_id).load.tap do |claimed|
- block.call(claimed)
+ block.call(claimed.map(&:job_id))
end
end
diff --git a/app/models/solid_queue/ready_execution.rb b/app/models/solid_queue/ready_execution.rb
index 8eeaddc..2c68cf9 100644
--- a/app/models/solid_queue/ready_execution.rb
+++ b/app/models/solid_queue/ready_execution.rb
@@ -24,20 +24,21 @@ module SolidQueue
return [] if limit <= 0
transaction do
- job_ids = select_candidates(queue_relation, limit)
- lock_candidates(job_ids, process_id)
+ candidates = select_candidates(queue_relation, limit)
+ lock_candidates(candidates, process_id)
end
end
def select_candidates(queue_relation, limit)
- queue_relation.ordered.limit(limit).non_blocking_lock.pluck(:job_id)
+ queue_relation.ordered.limit(limit).non_blocking_lock.select(:id, :job_id)
end
- def lock_candidates(job_ids, process_id)
- return [] if job_ids.none?
+ def lock_candidates(executions, process_id)
+ return [] if executions.none?
- SolidQueue::ClaimedExecution.claiming(job_ids, process_id) do |claimed|
- where(job_id: claimed.pluck(:job_id)).delete_all
+ SolidQueue::ClaimedExecution.claiming(executions.map(&:job_id), process_id) do |claimed_job_ids|
+ ids_to_delete = executions.index_by(&:job_id).values_at(*claimed_job_ids).map(&:id)
+ where(id: ids_to_delete).delete_all
end
end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @rosa . Thank you for your response.
The solution is good for me. It's more efficient.
Will the fix be included in the next release version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes! Let me add that one and get a new version out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
This is another take on #229, that tries to solve a deadlock like this: ``` *** (1) TRANSACTION: TRANSACTION 5223, ACTIVE 0 sec inserting mysql tables in use 1, locked 1 LOCK WAIT 5 lock struct(s), heap size 1128, 3 row lock(s), undo log entries 2 MySQL thread id 172, OS thread handle 281471652687808, query id 11099 192.168.0.5 root update INSERT INTO `solid_queue_ready_executions` (`job_id`, `queue_name`, `priority`, `created_at`) VALUES (469, 'default', 0, '2024-05-21 01:15:11.201125') *** (1) HOLDS THE LOCK(S): RECORD LOCKS space id 12 page no 4 n bits 264 index PRIMARY of table `handson`.`solid_queue_ready_executions` trx id 5223 lock_mode X locks rec but not gap Record lock, heap no 144 PHYSICAL RECORD: n_fields 7; compact format; info bits 0 ... *** (1) WAITING FOR THIS LOCK TO BE GRANTED: RECORD LOCKS space id 12 page no 6 n bits 264 index index_solid_queue_poll_all of table `handson`.`solid_queue_ready_executions` trx id 5223 lock_mode X insert intention waiting Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0 0: len 8; hex 73757072656d756d; asc supremum;; ... *** (2) TRANSACTION: TRANSACTION 5227, ACTIVE 0 sec fetching rows mysql tables in use 1, locked 1 LOCK WAIT 10 lock struct(s), heap size 1128, 23 row lock(s), undo log entries 10 MySQL thread id 177, OS thread handle 281471649517504, query id 11103 192.168.0.4 root updating DELETE FROM `solid_queue_ready_executions` WHERE `solid_queue_ready_executions`.`job_id` IN (464, 465, 466, 467, 468) *** (2) HOLDS THE LOCK(S): RECORD LOCKS space id 12 page no 6 n bits 264 index index_solid_queue_poll_all of table `handson`.`solid_queue_ready_executions` trx id 5227 lock_mode X Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0 ... *** (2) WAITING FOR THIS LOCK TO BE GRANTED: RECORD LOCKS space id 12 page no 4 n bits 264 index PRIMARY of table `handson`.`solid_queue_ready_executions` trx id 5227 lock_mode X waiting Record lock, heap no 144 PHYSICAL RECORD: n_fields 7; compact format; info bits 0 ```
This is another take on #229, that tries to solve a deadlock like this: ``` *** (1) TRANSACTION: TRANSACTION 5223, ACTIVE 0 sec inserting mysql tables in use 1, locked 1 LOCK WAIT 5 lock struct(s), heap size 1128, 3 row lock(s), undo log entries 2 MySQL thread id 172, OS thread handle 281471652687808, query id 11099 192.168.0.5 root update INSERT INTO `solid_queue_ready_executions` (`job_id`, `queue_name`, `priority`, `created_at`) VALUES (469, 'default', 0, '2024-05-21 01:15:11.201125') *** (1) HOLDS THE LOCK(S): RECORD LOCKS space id 12 page no 4 n bits 264 index PRIMARY of table `handson`.`solid_queue_ready_executions` trx id 5223 lock_mode X locks rec but not gap Record lock, heap no 144 PHYSICAL RECORD: n_fields 7; compact format; info bits 0 ... *** (1) WAITING FOR THIS LOCK TO BE GRANTED: RECORD LOCKS space id 12 page no 6 n bits 264 index index_solid_queue_poll_all of table `handson`.`solid_queue_ready_executions` trx id 5223 lock_mode X insert intention waiting Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0 0: len 8; hex 73757072656d756d; asc supremum;; ... *** (2) TRANSACTION: TRANSACTION 5227, ACTIVE 0 sec fetching rows mysql tables in use 1, locked 1 LOCK WAIT 10 lock struct(s), heap size 1128, 23 row lock(s), undo log entries 10 MySQL thread id 177, OS thread handle 281471649517504, query id 11103 192.168.0.4 root updating DELETE FROM `solid_queue_ready_executions` WHERE `solid_queue_ready_executions`.`job_id` IN (464, 465, 466, 467, 468) *** (2) HOLDS THE LOCK(S): RECORD LOCKS space id 12 page no 6 n bits 264 index index_solid_queue_poll_all of table `handson`.`solid_queue_ready_executions` trx id 5227 lock_mode X Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0 ... *** (2) WAITING FOR THIS LOCK TO BE GRANTED: RECORD LOCKS space id 12 page no 4 n bits 264 index PRIMARY of table `handson`.`solid_queue_ready_executions` trx id 5227 lock_mode X waiting Record lock, heap no 144 PHYSICAL RECORD: n_fields 7; compact format; info bits 0 ```
This is another take on rails/solid_queue#229, that tries to solve a deadlock like this: ``` *** (1) TRANSACTION: TRANSACTION 5223, ACTIVE 0 sec inserting mysql tables in use 1, locked 1 LOCK WAIT 5 lock struct(s), heap size 1128, 3 row lock(s), undo log entries 2 MySQL thread id 172, OS thread handle 281471652687808, query id 11099 192.168.0.5 root update INSERT INTO `solid_queue_ready_executions` (`job_id`, `queue_name`, `priority`, `created_at`) VALUES (469, 'default', 0, '2024-05-21 01:15:11.201125') *** (1) HOLDS THE LOCK(S): RECORD LOCKS space id 12 page no 4 n bits 264 index PRIMARY of table `handson`.`solid_queue_ready_executions` trx id 5223 lock_mode X locks rec but not gap Record lock, heap no 144 PHYSICAL RECORD: n_fields 7; compact format; info bits 0 ... *** (1) WAITING FOR THIS LOCK TO BE GRANTED: RECORD LOCKS space id 12 page no 6 n bits 264 index index_solid_queue_poll_all of table `handson`.`solid_queue_ready_executions` trx id 5223 lock_mode X insert intention waiting Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0 0: len 8; hex 73757072656d756d; asc supremum;; ... *** (2) TRANSACTION: TRANSACTION 5227, ACTIVE 0 sec fetching rows mysql tables in use 1, locked 1 LOCK WAIT 10 lock struct(s), heap size 1128, 23 row lock(s), undo log entries 10 MySQL thread id 177, OS thread handle 281471649517504, query id 11103 192.168.0.4 root updating DELETE FROM `solid_queue_ready_executions` WHERE `solid_queue_ready_executions`.`job_id` IN (464, 465, 466, 467, 468) *** (2) HOLDS THE LOCK(S): RECORD LOCKS space id 12 page no 6 n bits 264 index index_solid_queue_poll_all of table `handson`.`solid_queue_ready_executions` trx id 5227 lock_mode X Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0 ... *** (2) WAITING FOR THIS LOCK TO BE GRANTED: RECORD LOCKS space id 12 page no 4 n bits 264 index PRIMARY of table `handson`.`solid_queue_ready_executions` trx id 5227 lock_mode X waiting Record lock, heap no 144 PHYSICAL RECORD: n_fields 7; compact format; info bits 0 ```
This is another take on rails/solid_queue#229, that tries to solve a deadlock like this: ``` *** (1) TRANSACTION: TRANSACTION 5223, ACTIVE 0 sec inserting mysql tables in use 1, locked 1 LOCK WAIT 5 lock struct(s), heap size 1128, 3 row lock(s), undo log entries 2 MySQL thread id 172, OS thread handle 281471652687808, query id 11099 192.168.0.5 root update INSERT INTO `solid_queue_ready_executions` (`job_id`, `queue_name`, `priority`, `created_at`) VALUES (469, 'default', 0, '2024-05-21 01:15:11.201125') *** (1) HOLDS THE LOCK(S): RECORD LOCKS space id 12 page no 4 n bits 264 index PRIMARY of table `handson`.`solid_queue_ready_executions` trx id 5223 lock_mode X locks rec but not gap Record lock, heap no 144 PHYSICAL RECORD: n_fields 7; compact format; info bits 0 ... *** (1) WAITING FOR THIS LOCK TO BE GRANTED: RECORD LOCKS space id 12 page no 6 n bits 264 index index_solid_queue_poll_all of table `handson`.`solid_queue_ready_executions` trx id 5223 lock_mode X insert intention waiting Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0 0: len 8; hex 73757072656d756d; asc supremum;; ... *** (2) TRANSACTION: TRANSACTION 5227, ACTIVE 0 sec fetching rows mysql tables in use 1, locked 1 LOCK WAIT 10 lock struct(s), heap size 1128, 23 row lock(s), undo log entries 10 MySQL thread id 177, OS thread handle 281471649517504, query id 11103 192.168.0.4 root updating DELETE FROM `solid_queue_ready_executions` WHERE `solid_queue_ready_executions`.`job_id` IN (464, 465, 466, 467, 468) *** (2) HOLDS THE LOCK(S): RECORD LOCKS space id 12 page no 6 n bits 264 index index_solid_queue_poll_all of table `handson`.`solid_queue_ready_executions` trx id 5227 lock_mode X Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0 ... *** (2) WAITING FOR THIS LOCK TO BE GRANTED: RECORD LOCKS space id 12 page no 4 n bits 264 index PRIMARY of table `handson`.`solid_queue_ready_executions` trx id 5227 lock_mode X waiting Record lock, heap no 144 PHYSICAL RECORD: n_fields 7; compact format; info bits 0 ```
I got deadlock when running multiple workers/dispatchers with docker compose.
I found @rosa PR and fixed it using a similar approach.
It does not completely eliminate deadlock but mitigates deadlock I think.
The deadlock error is following: