Skip to content

feat(ws): Implement workspace start + pause as backend APIs #340

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: notebooks-v2
Choose a base branch
from

Conversation

andyatmiami
Copy link

@andyatmiami andyatmiami commented May 14, 2025

related: #298

  • Added StartWorkspaceHandler + PauseWorkspaceHandler to handle the respective workspace actions
  • Introduced new routes for starting and pausing workspaces in the API.
    • api/v1/workspaces/{namespace}/{name}/actions/start
    • api/v1/workspaces/{namespace}/{name}/actions/pause
  • Created a new PauseStateEnvelope type for successful responses.
  • Added tests for the new APIs, including success and error cases.
  • Updated README/OpenAPI documentation to include the new endpoints.

Functionality tested/verified via generated Swagger client:

image

$ kubectl get workspace
NAME                           STATE
andy-test-direct-with-secret   Running
andy-test-pr-verify            Paused
andy-test-with-secret          Running
andy-test-with-secret-no-op    Running
jupyterlab-workspace           Paused

<interact with /start endpoint in Swagger UI>

$ kubectl get workspace
NAME                           STATE
andy-test-direct-with-secret   Running
andy-test-pr-verify            Running
andy-test-with-secret          Running
andy-test-with-secret-no-op    Running
jupyterlab-workspace           Paused

$ kubectl get events --sort-by='.lastTimestamp'
LAST SEEN   TYPE      REASON                             OBJECT                                     MESSAGE
...
44s         Normal    Scheduled                          pod/ws-andy-test-pr-verify-z5qfg-0         Successfully assigned default/ws-andy-test-pr-verify-z5qfg-0 to kind-control-plane
44s         Normal    SuccessfulCreate                   statefulset/ws-andy-test-pr-verify-z5qfg   create Pod ws-andy-test-pr-verify-z5qfg-0 in StatefulSet ws-andy-test-pr-verify-z5qfg successful
43s         Normal    Pulled                             pod/ws-andy-test-pr-verify-z5qfg-0         Container image "docker.io/kubeflownotebookswg/jupyter-scipy:v1.9.0" already present on machine
43s         Normal    Created                            pod/ws-andy-test-pr-verify-z5qfg-0         Created container: main
43s         Normal    Started                            pod/ws-andy-test-pr-verify-z5qfg-0         Started container main

Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign kimwnasptd for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ederign
Copy link
Member

ederign commented May 14, 2025

/ok-to-test

@andyatmiami andyatmiami force-pushed the feat/workspace-start-api branch 3 times, most recently from fb2e2fb to f22b3d3 Compare May 15, 2025 20:54
Copy link

@harshad16 harshad16 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work 💯
Tested with cli:
Screenshot from 2025-05-20 16-50-35

related: kubeflow#298

- Added StartWorkspaceHandler + PauseWorkspaceHandler to handle the respective workspace actions
- Introduced new routes for starting and pausing workspaces in the API.
    - `api/v1/workspaces/{namespace}/{name}/actions/start`
    - `api/v1/workspaces/{namespace}/{name}/actions/pause`
- Created a new PauseStateEnvelope type for successful responses.
- Added tests for the new APIs, including success and error cases.
- Updated README/OpenAPI documentation to include the new endpoints.

Signed-off-by: Andy Stoneberg <[email protected]>
@andyatmiami andyatmiami force-pushed the feat/workspace-start-api branch from 8d5e467 to 3237336 Compare May 21, 2025 18:05
@andyatmiami andyatmiami marked this pull request as ready for review May 21, 2025 18:08
@andyatmiami andyatmiami changed the title feat(ws): Add start workspace functionality to backend API feat(ws): Implement workspace start + pause as backend APIs May 21, 2025
@andyatmiami andyatmiami requested a review from harshad16 May 21, 2025 18:10
Copy link
Member

@ederign ederign left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some questions inline.

return
}

// Get the workspace to check its paused state
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this comment is outdated.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, sorry.. its not doing a "check" here - moreso querying the workspace to return its actual/persisted state.


// Get the workspace to check its paused state
workspace, err := a.repositories.Workspace.GetWorkspace(r.Context(), namespace, workspaceName)
if err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On this call, we are not checking anything, are we?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is our intention check this based on the. action?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(and lets not forget about the async especially on start)

Copy link
Author

@andyatmiami andyatmiami May 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps the phrase "checking" is misleading here - but we do interact with this queried workspace to return this value:

Does this warrant/justify a "read" operation when the backend could potentially simply "infer" this value? I think its still better to reflect actual state after operation by querying the resource..

But, I don't really follow your comments about "async" here... as mentioned in other comment... setting the .spec.paused attribute IS instantaneous (which is all we are concerned about in this function)

admittedly the State of the workspace resource is subject to eventual consistency - but that isn't (in my view) a focus/concern of this API

Copy link
Member

@ederign ederign May 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Andy, thanks for the clarification on the chat. let's fup here!

Copy link
Author

@andyatmiami andyatmiami May 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, for transparency here.. I was slightly incorrect in my comments above...

It seems the RawPatch command leveraged in the application code is itself asynchronously applied by k8s... so there is a race condition present here in this version of the code...

At the time we call this GetWorkspace function - we cannot guarantee that the Patch has been applied.. and as such the .spec.paused attribute might not have been updated yet...

I'm honestly not sure how to handle this...

Blocking and waiting seems kinda gross (to me) - but I also feel dirty hard-coding the .data.paused value we return in the PausedState response struct...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should definitely not block it. For the UI, maybe we can use the status to notify user that something is being processed.

I would not hardcode the value, because, for instance, on start, things can fail, and if we hardcode, it could be inconsistent.

Why don't return 202 and something like
{data:
"status": "accepted",
"message": "bla",
"
}

}

// Return 200 OK with pause state
err = a.WriteJSON(w, http.StatusOK, PauseStateEnvelope{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question here. I assume Pause/Start actions are not instantaneous, shall we just return a 202 accepted?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well... setting the spec.paused property itself is instantaneous...

I generally think of 202 meaning the API request itself has not been processed...

whereas this case is "the backend k8s state is now churning to reach a state of eventual consistency"

Copy link
Member

@ederign ederign May 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Andy, thanks for the clarification on the chat. let's fup here!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comment above: #340 (comment)

202 might be the safer option - but still need to then answer/determine if/how we want to return the paused attribute in the response

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Needs Triage
Development

Successfully merging this pull request may close these issues.

3 participants