Skip to content

katib: Update LLM HP tuning guide to clarify tunable fields and fix resource section #4067

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

SanthoshToorpu
Copy link

…es config, remove unsupported params

Checklist:

  • You have signed off your commits
  • Ensure you follow best practices from our guide. Contributing.
  • You have included screenshots when changing the website style or adding a new page.

Description of your changes:
As per andrey's recommendation I added the changes mentioned in the following screenshots attached. A minimal work.
image (2)
image (1)
image

Issue

Closes: #2522

Labels

/area katib

/area website


…es config, remove unsupported params

Signed-off-by: SanthoshToorpu <[email protected]>
@google-oss-prow google-oss-prow bot added the area/katib AREA: Kubeflow Katib label Mar 29, 2025
Copy link

Hi @SanthoshToorpu. Thanks for your PR.

I'm waiting for a kubeflow member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@SanthoshToorpu
Copy link
Author

@helenxie-bit please look into the changes

Signed-off-by: SanthoshToorpu <[email protected]>
…v1 docs instead of referring in the llm-hp file

Signed-off-by: SanthoshToorpu <[email protected]>
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign gaocegege for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@google-oss-prow google-oss-prow bot added size/L and removed size/S labels Mar 30, 2025
Copy link
Member

@Arhell Arhell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/ok-to-test

Signed-off-by: SanthoshToorpu <[email protected]>
Signed-off-by: SanthoshToorpu <[email protected]>
@SanthoshToorpu
Copy link
Author

SanthoshToorpu commented Mar 30, 2025

@andreyvelich as per @helenxie-bit instructions and after reading the thread i fixed 2/3 issues except for the params you told to remove irrelevant params.

image

I downloaded the docker image rn but can you be more specific on what to be removed and what not?

@SanthoshToorpu SanthoshToorpu changed the title Update Katib LLM HP tuning guide: clarify tunable fields, fix resourc… katib: Update LLM HP tuning guide to clarify tunable fields and fix resource section Mar 30, 2025
Copy link
Contributor

@helenxie-bit helenxie-bit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this contribution! It helps a lot to make the user guide clearer. Here are a few suggestions:

@@ -165,6 +109,7 @@ In addition to Hugging Face, you can integrate with S3-compatible object storage
from kubeflow.storage_initializer.s3 import S3DatasetParams
```


#### S3DatasetParams
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest we also move this part to the Training Operator doc and cross-reference it from this doc: https://github.com/kubeflow/website/blob/master/content/en/docs/components/trainer/legacy-v1/user-guides/fine-tuning.md

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The S3 data params part? I guess that has rather less relevance to training operator. However if you are referencing to the example I believe that we should have a lil snippet that gives a high level overview as most people are lazy enough to browse nested links....

I'll commit with the three params removed. But please give me a heads up wrt the above suggestion

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, S3DatasetParams is another way to define an external dataset, just like HuggingfaceDatasetParams. So I think it makes sense to move its explanation into the Training Operator documentation as well.

@helenxie-bit
Copy link
Contributor

@andreyvelich as per @helenxie-bit instructions and after reading the thread i fixed 2/3 issues except for the params you told to remove irrelevant params.

image

I downloaded the docker image rn but can you be more specific on what to be removed and what not?

I think we need to remove unused parameters in this part: https://www.kubeflow.org/docs/components/katib/user-guides/llm-hp-optimization/#key-parameters-for-llm-hyperparameter-tuning. Can you remove parameters objective, base_image, and parameters? Since they will not be used when optimizing hyperparameters for LLMs.

@helenxie-bit
Copy link
Contributor

Ref issue: kubeflow/katib#2522

@SanthoshToorpu
Copy link
Author

@andreyvelich as per @helenxie-bit instructions and after reading the thread i fixed 2/3 issues except for the params you told to remove irrelevant params.
image
I downloaded the docker image rn but can you be more specific on what to be removed and what not?

I think we need to remove unused parameters in this part: https://www.kubeflow.org/docs/components/katib/user-guides/llm-hp-optimization/#key-parameters-for-llm-hyperparameter-tuning. Can you remove parameters objective, base_image, and parameters? Since they will not be used when optimizing hyperparameters for LLMs.

Okay so only those threee params?

@helenxie-bit
Copy link
Contributor

@andreyvelich as per @helenxie-bit instructions and after reading the thread i fixed 2/3 issues except for the params you told to remove irrelevant params.
image
I downloaded the docker image rn but can you be more specific on what to be removed and what not?

I think we need to remove unused parameters in this part: https://www.kubeflow.org/docs/components/katib/user-guides/llm-hp-optimization/#key-parameters-for-llm-hyperparameter-tuning. Can you remove parameters objective, base_image, and parameters? Since they will not be used when optimizing hyperparameters for LLMs.

Okay so only those threee params?

Yes, all other parameters may be used when optimizing hyperparameters for LLMs.

@SanthoshToorpu
Copy link
Author

@helenxie-bit :Great! Maybe here it's better to use a different title since we include S3DatasetParams in this part too:

how about

Dataset and Model Parameter Classes

in legacy trainer docs

@helenxie-bit
Copy link
Contributor

@helenxie-bit :Great! Maybe here it's better to use a different title since we include S3DatasetParams in this part too:

how about

Dataset and Model Parameter Classes

in legacy trainer docs

That sounds good to me.

@SanthoshToorpu
Copy link
Author

Please review

@helenxie-bit
Copy link
Contributor

Thanks for the contribution! LGTM! Please have a review when you have time @andreyvelich @mahdikhashan

@mahdikhashan
Copy link
Member

/assign

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Adding a community-managed GA code to the website
4 participants