Skip to content

ARTEMIS-5376 Include all messages in queue management operations #5593

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

AntonRoskvist
Copy link
Contributor

Currently, some queue management operations act on all messages (including paged and scheduled messages) and some don't. Some operations use their own iterator logic (internally) and some use a common method.

This change aims to improve this by adding some functionality into the 'QueueIterateAction' class and then have additional management operations use it (so that their behavior is consistent with regards to what messages they target and how)

As of now I have left expire and delete reference/references out as I wasn't 100% sure it wont have any adverse effects. If desired, they should be trivial to add as well.

@jbertram
Copy link
Contributor

org.apache.activemq.artemis.tests.integration.server.LVQTest#testChangeReferencePriority is failing due to this change.

@clebertsuconic
Copy link
Contributor

I saw Justin started a run of your PR in one of our CIs..

Just out of coincidence I saw that my Jenkins was stuck, I logged into the POD and I saw this deadlock while running org.apache.activemq.artemis.tests.integration.management.ManagementWithPagingServerTest

Java stack information for the threads listed above:
===================================================
"Thread-1 (ActiveMQ-server-ActiveMQServerImpl::name=localhost)":
        at org.apache.activemq.artemis.core.server.impl.RefsOperation.afterCommit(RefsOperation.java:190)
        - waiting to lock <0x000000009d807300> (a org.apache.activemq.artemis.core.server.impl.QueueImpl)
        at org.apache.activemq.artemis.core.transaction.impl.TransactionImpl.afterCommit(TransactionImpl.java:590)
        - locked <0x000000009cf16870> (a org.apache.activemq.artemis.core.transaction.impl.TransactionImpl)
        at org.apache.activemq.artemis.core.transaction.impl.TransactionImpl$2.done(TransactionImpl.java:317)
        at org.apache.activemq.artemis.core.persistence.impl.journal.OperationContextImpl.lambda$execute$0(OperationContextImpl.java:369)
        at org.apache.activemq.artemis.core.persistence.impl.journal.OperationContextImpl$$Lambda$977/0x00007f471c4cd588.run(Unknown Source)
        at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:59)
        at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:32)
        at org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:68)
        at org.apache.activemq.artemis.utils.actors.ProcessorBase$$Lambda$756/0x00007f471c30f020.run(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor.runWorker([email protected]/ThreadPoolExecutor.java:1136)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run([email protected]/ThreadPoolExecutor.java:635)
        at org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:120)
"Thread-5073":
        at jdk.internal.misc.Unsafe.park([email protected]/Native Method)
        - parking to wait for  <0x000000009d807d78> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
        at java.util.concurrent.locks.LockSupport.park([email protected]/LockSupport.java:211)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire([email protected]/AbstractQueuedSynchronizer.java:715)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire([email protected]/AbstractQueuedSynchronizer.java:938)
        at java.util.concurrent.locks.ReentrantLock$Sync.lock([email protected]/ReentrantLock.java:153)
        at java.util.concurrent.locks.ReentrantLock.lock([email protected]/ReentrantLock.java:322)
        at org.apache.activemq.artemis.core.server.impl.QueueImpl.iterQueue(QueueImpl.java:2019)
        at org.apache.activemq.artemis.core.server.impl.QueueImpl.copyReference(QueueImpl.java:2510)
        - locked <0x000000009d807300> (a org.apache.activemq.artemis.core.server.impl.QueueImpl)
        at org.apache.activemq.artemis.core.management.impl.QueueControlImpl.copyMessage(QueueControlImpl.java:1380)
        at jdk.internal.reflect.GeneratedMethodAccessor242.invoke(Unknown Source)
        at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke([email protected]/DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke([email protected]/Method.java:569)
        at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:72)
        at jdk.internal.reflect.GeneratedMethodAccessor241.invoke(Unknown Source)
        at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke([email protected]/DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke([email protected]/Method.java:569)
        at sun.reflect.misc.MethodUtil.invoke([email protected]/MethodUtil.java:262)
        at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2([email protected]/StandardMBeanIntrospector.java:112)
        at com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2([email protected]/StandardMBeanIntrospector.java:46)
        at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM([email protected]/MBeanIntrospector.java:237)
        at com.sun.jmx.mbeanserver.PerInterface.invoke([email protected]/PerInterface.java:138)
        at com.sun.jmx.mbeanserver.MBeanSupport.invoke([email protected]/MBeanSupport.java:252)
        at javax.management.StandardMBean.invoke([email protected]/StandardMBean.java:405)
        at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke([email protected]/DefaultMBeanServerInterceptor.java:814)
        at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke([email protected]/JmxMBeanServer.java:802)
        at javax.management.MBeanServerInvocationHandler.invoke([email protected]/MBeanServerInvocationHandler.java:298)
        at jdk.proxy2.$Proxy105.copyMessage(jdk.proxy2/Unknown Source)
        at org.apache.activemq.artemis.tests.integration.management.ManagementWithPagingServerTest$ManagementCopyThread.run(ManagementWithPagingServerTest.java:491)
"Thread-4 (ActiveMQ-PageExecutor-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$3@4716b1ec)":
        at org.apache.activemq.artemis.core.server.impl.QueueImpl.depage(QueueImpl.java:3188)
        - waiting to lock <0x000000009d807300> (a org.apache.activemq.artemis.core.server.impl.QueueImpl)
        at org.apache.activemq.artemis.core.server.impl.QueueImpl.lambda$scheduleDepage$9(QueueImpl.java:3176)
        at org.apache.activemq.artemis.core.server.impl.QueueImpl$$Lambda$1628/0x00007f471ca38cf0.run(Unknown Source)
        at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:59)
        at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:32)
        at org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:68)
        at org.apache.activemq.artemis.utils.actors.ProcessorBase$$Lambda$756/0x00007f471c30f020.run(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor.runWorker([email protected]/ThreadPoolExecutor.java:1136)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run([email protected]/ThreadPoolExecutor.java:635)
        at org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:120)

Found 1 deadlock.

at first I thought it was non related to your PR (as I am working hard on paging right now to make sure there are no deadlocks), but as i read through your changes you changed some Iterators to use iterQueue.. and iterQueue itself will have a different locking than the previous one that was using synchronized.

I think this introduces a real issue / deadlock.. you can probably flush this out by running ManagementWithPagingServerTest in a loop with your IDE (Idea has some options for that)

I also suggest rebasing, but the issue I saw it wouldn't be fixed by any rebasing.. just to be on the tip of the branch.

@AntonRoskvist
Copy link
Contributor Author

Thanks for reviewing @jbertram and @clebertsuconic.

I believe I have found the issue for both points and I'm pushing my changes now, though I will make a second check tomorrow as well to make sure I didn't miss anything (running the full test suite takes some time on my work laptop).

@jbertram I believe the addition of a short wait was all that's needed for the LVQ-test, please let me know if you think this is incorrect.

@clebertsuconic Nice catch, with moving the copyReference logic over to use iterQueue I should have also removed synchronization of that method as that's now covered by the iterQueue logic.

Again, I will run additional tests on this and update with what I find. Thanks

@AntonRoskvist
Copy link
Contributor Author

@jbertram @clebertsuconic

Sorry for taking so long to get back to you on this. I believe these changes should cover the cases you noted.

I don't really like the change on how to handle the LVQ case, though at this moment I don't know how to handle it differently while also making use of the QueueIterateAction (refRemoved has to be called before addTail for LVQs to keep the correct current value).

Regardless, this should be working as intended and without locking... if you have any suggestions on a better approach for the LVQ case just let me know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants