Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: Improve Task Cleanup and Immediate Termination in Workflow Stop (#573) #581

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

saikiranAnnam
Copy link
Collaborator

This PR addresses the issue where the "Stop Workflow" button didn't properly clean up the task queue or stop tasks immediately. Key improvements include:

Immediate Task Termination: The end prematurely () method now interrupts the task’s thread (if applicable) and ensures that long-running operations are stopped promptly by checking the task’s status.

Task Queue Cleanup: When a task is stopped, the waiting and running lists are properly cleaned up. Tasks are marked as STOPPED and removed from the queues immediately.

Enhanced Monitoring: The stopMonitor() method ensures tasks on the waiting list are processed and stopped immediately.

These changes ensure that tasks are correctly cleaned up and stopped at the right moment when the stop button is pressed, improving workflow responsiveness and stability.

tasksToRemove.add(thet); // Mark for removal
}
}
waitinglist.removeAll(tasksToRemove); // Remove all tasks at once after iteration
Copy link
Member

@ZihengSun ZihengSun Nov 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we remove all that at once, what is the difference from the previous one exactly?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the previous version of code - Removing During Iteration.
revised code - Removal after Iteration.

Removing During Iteration vs. After Iteration:

Previous Version: tasks are removed from the waitinglist while iterating over it. This can cause problems with the iteration itself, especially if you remove items during the iteration process.

Issue with Direct Removal During Iteration: When you remove a task from the list, the iterator moves forward to the next element, but it skips checking the element that comes immediately after the removed task. This happens because after removal, the next element shifts into the current index position, and the iterator’s next() method skips over it. This can cause missing tasks that should have been processed.

Current Version: The updated code collects tasks to be removed in a separate List tasksToRemove. Once the iteration is complete, the tasks to be removed are all removed in a single operation with waitinglist.removeAll(tasksToRemove).

Advantages: By collecting tasks to be removed first and then performing the removal after the iteration, you avoid modifying the list during iteration, preventing skipping tasks and ensuring that all matching tasks are processed correctly.

executeATask(newtask);
waitinglist.remove(newtask); // Remove task from waiting list
if (newtask.checkShouldPassOrNot()) {
newtask.endPrematurely(); // End the task immediately if it should not proceed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you removed the notifyWaitinglist function, please explain why? how does the queue know that this waiting list is changed for worker to pick up new one?

* operation
*/
private void stopRunningOperations() {
if (this.curstatus == ExecutionStatus.STOPPED) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems this function does nothing. Why is it needed?

}

// Ensure the task is not performing any further operations
stopRunningOperations();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function does nothing.

Copy link
Collaborator Author

@saikiranAnnam saikiranAnnam Nov 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function doesn't do anything as of now, I just wrote this for an edge case(for the future case) if the processes come under these three cases:

  1. Long-running Loops
  2. I/O Operations: reading from a file or waiting for network data
  3. Timers or Delayed Tasks

But the inside logic can be written for the future case

this.curstatus = ExecutionStatus.STOPPED;

// If the task is running in a separate thread, interrupt the thread
if (Thread.currentThread().isInterrupted()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very weird and risky way to stop current thread. Please explain why it won't kill geoweaver itself.

Copy link
Collaborator Author

@saikiranAnnam saikiranAnnam Nov 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interrupting the Current Thread Only Affects the Task's Thread:

  1. Thread.currentThread().interrupt() - interrupts the currently executing thread, which means it only affects the task that is running in that thread.
  2. Calling interrupt() does not kill or forcefully stop the thread; it merely sets the interrupt flag of the thread.
  3. Geoweaver manages its worker threads in a thread pool. Interrupting one worker thread (through Thread.currentThread().interrupt()) will not bring down the entire system.

It only affects the thread executing the task, not the remaining threads.

Copy link
Member

@ZihengSun ZihengSun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have a lot of questions. You need to throughly test this, please design a comprehensive test case, as you are changing the core module of Geoweaver. There are cases that you might haven't think of. Please make sure fixing one bug raises hundreds more.


runninglist.remove(thet); // remove from waiting list
// Stop tasks in running list
synchronized (runninglist) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why making it synchronized?

Copy link
Collaborator Author

@saikiranAnnam saikiranAnnam Nov 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The synchronized (runninglist) block is necessary to ensure that no other thread can modify the runninglist while it is being iterated (i.e when stop operation in progress), preventing concurrency issues and ensuring no data loss. This is crucial in multi-threaded environments where multiple threads might interact with shared data.

@saikiranAnnam
Copy link
Collaborator Author

saikiranAnnam commented Nov 17, 2024

I will be testing this thoroughly with different test cases. I might miss some other cases i have missed off.

@saikiranAnnam saikiranAnnam reopened this Nov 17, 2024
@ZihengSun
Copy link
Member

I tested locally, the issue wasn't fixed. After I click the Stop button in the Weaver, the ongoing tasks are still ongoing. The issue is still there.

@ZihengSun
Copy link
Member

I will be testing this thoroughly with different test cases. I might miss some other cases i have missed off.

you might only test the short lived process. Try some long running tasks and try to stop them from the Weaver stop button while the workflow is running. I don't see the issue is gone. The current fix doesn't work.

@ZihengSun
Copy link
Member

You should learn more about how the Stop button in the process history table works before fixing this issue.

@saikiranAnnam
Copy link
Collaborator Author

saikiranAnnam commented Nov 18, 2024

I will be testing with long processes and also learn more about the stop process in the history table. And will get back on this with a new PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants