Skip to content

katib: Update LLM HP tuning guide to clarify tunable fields and fix resource section #4067

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: master
Choose a base branch
from
91 changes: 18 additions & 73 deletions content/en/docs/components/katib/user-guides/llm-hp-optimization.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,81 +40,25 @@ from kubeflow.storage_initializer.hugging_face import (
)
```

#### HuggingFaceModelParams

##### Description

The `HuggingFaceModelParams` dataclass holds configuration parameters for initializing Hugging Face models with validation checks.

| **Attribute** | **Type** | **Description** |
| ------------------ | --------------------------------- | ---------------------------------------------------------- |
| `model_uri` | `str` | URI or path to the Hugging Face model (must not be empty). |
| `transformer_type` | `TRANSFORMER_TYPES` | Specifies the model type for various NLP/ML tasks. |
| `access_token` | `Optional[str]` (default: `None`) | Token for accessing private models on Hugging Face. |
| `num_labels` | `Optional[int]` (default: `None`) | Number of output labels (used for classification tasks). |

###### Supported Transformer Types (`TRANSFORMER_TYPES`)

| **Model Type** | **Task** |
| ------------------------------------ | ------------------------ |
| `AutoModelForSequenceClassification` | Text classification |
| `AutoModelForTokenClassification` | Named entity recognition |
| `AutoModelForQuestionAnswering` | Question answering |
| `AutoModelForCausalLM` | Text generation (causal) |
| `AutoModelForMaskedLM` | Masked language modeling |
| `AutoModelForImageClassification` | Image classification |

##### Example Usage

```python
from transformers import AutoModelForSequenceClassification

from kubeflow.storage_initializer.hugging_face import HuggingFaceModelParams
{{% alert title="Note" color="info" %}}
The detailed descriptions of these parameter classes have been moved to the [Training Operator fine-tuning guide](/docs/components/trainer/legacy-v1/user-guides/fine-tuning/#huggingface-parameter-classes). This page provides a brief overview of how to use these classes with Katib for hyperparameter optimization.
{{% /alert %}}

#### HuggingFaceModelParams

params = HuggingFaceModelParams(
model_uri="bert-base-uncased",
transformer_type=AutoModelForSequenceClassification,
access_token="huggingface_access_token",
num_labels=2 # For binary classification
)
```
The `HuggingFaceModelParams` dataclass holds configuration parameters for initializing Hugging Face models. For detailed documentation and examples, see the [Training Operator fine-tuning guide](/docs/components/trainer/legacy-v1/user-guides/fine-tuning/#huggingfacemodelparams).

#### HuggingFaceDatasetParams

##### Description

The `HuggingFaceDatasetParams` class holds configuration parameters for loading datasets from Hugging Face with validation checks.

| **Attribute** | **Type** | **Description** |
| -------------- | --------------------------------- | ------------------------------------------------------------------------- |
| `repo_id` | `str` | Identifier of the dataset repository on Hugging Face (must not be empty). |
| `access_token` | `Optional[str]` (default: `None`) | Token for accessing private datasets on Hugging Face. |
| `split` | `Optional[str]` (default: `None`) | Dataset split to load (e.g., `"train"`, `"test"`). |

##### Example Usage

```python
from kubeflow.storage_initializer.hugging_face import HuggingFaceDatasetParams


dataset_params = HuggingFaceDatasetParams(
repo_id="imdb", # Public dataset repository ID on Hugging Face
split="train", # Dataset split to load
access_token=None # Not needed for public datasets
)
```
The `HuggingFaceDatasetParams` class holds configuration parameters for loading datasets from Hugging Face. For detailed documentation and examples, see the [Training Operator fine-tuning guide](/docs/components/trainer/legacy-v1/user-guides/fine-tuning/#huggingfacedatasetparams).

#### HuggingFaceTrainerParams

##### Description

The `HuggingFaceTrainerParams` class is used to define parameters for the training process in the Hugging Face framework. It includes the training arguments and LoRA configuration to optimize model training.
The `HuggingFaceTrainerParams` class is used to define parameters for the training process in the Hugging Face framework. For detailed documentation and examples, see the [Training Operator fine-tuning guide](/docs/components/trainer/legacy-v1/user-guides/fine-tuning/#huggingfacetrainerparams).

| **Parameter** | **Type** | **Description** |
| --------------------- | -------------------------------- | ----------------------------------------------------------------------------- |
| `training_parameters` | `transformers.TrainingArguments` | Contains the training arguments like learning rate, epochs, batch size, etc. |
| `lora_config` | `LoraConfig` | LoRA configuration to reduce the number of trainable parameters in the model. |
{{% alert title="Note" color="info" %}}
Currently, only parameters within `training_parameters` and `lora_config` can be tuned using Katib's search API. Other fields are static and cannot be tuned.
{{% /alert %}}

###### Katib Search API for Defining Hyperparameter Search Space

Expand Down Expand Up @@ -165,6 +109,7 @@ In addition to Hugging Face, you can integrate with S3-compatible object storage
from kubeflow.storage_initializer.s3 import S3DatasetParams
```


#### S3DatasetParams
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest we also move this part to the Training Operator doc and cross-reference it from this doc: https://github.com/kubeflow/website/blob/master/content/en/docs/components/trainer/legacy-v1/user-guides/fine-tuning.md

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The S3 data params part? I guess that has rather less relevance to training operator. However if you are referencing to the example I believe that we should have a lil snippet that gives a high level overview as most people are lazy enough to browse nested links....

I'll commit with the three params removed. But please give me a heads up wrt the above suggestion

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, S3DatasetParams is another way to define an external dataset, just like HuggingfaceDatasetParams. So I think it makes sense to move its explanation into the Training Operator documentation as well.


##### Description
Expand Down Expand Up @@ -243,7 +188,7 @@ In the context of optimizing hyperparameters of large language models (LLMs) lik
resources_per_trial=katib.TrainerResources(
num_workers=1,
num_procs_per_worker=1,
resources_per_worker={"gpu": 0, "cpu": 1, "memory": "10G",},
resources_per_worker={"gpu": 0, "cpu": 1, "memory": "10G"},
)
```

Expand Down Expand Up @@ -330,15 +275,15 @@ cl.tune(
algorithm_name = "random",
max_trial_count = 10,
parallel_trial_count = 2,
resources_per_trial={
"gpu": "2",
"cpu": "4",
"memory": "10G",
},
resources_per_trial=katib.TrainerResources(
num_workers=2,
num_procs_per_worker=2,
resources_per_worker={"gpu": 2, "cpu": 4, "memory": "10G"},
),
)

cl.wait_for_experiment_condition(name=exp_name)

# Get the best hyperparameters.
print(cl.get_optimal_hyperparameters(exp_name))
```
```
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,104 @@ TrainingClient().train(
After you execute `train`, the Training Operator will orchestrate the appropriate PyTorchJob resources
to fine-tune the LLM.

## HuggingFace Parameter Classes

### HuggingFaceModelParams

#### Description

The `HuggingFaceModelParams` dataclass holds configuration parameters for initializing Hugging Face models with validation checks.

| **Attribute** | **Type** | **Description** |
| ------------------ | --------------------------------- | ---------------------------------------------------------- |
| `model_uri` | `str` | URI or path to the Hugging Face model (must not be empty). |
| `transformer_type` | `TRANSFORMER_TYPES` | Specifies the model type for various NLP/ML tasks. |
| `access_token` | `Optional[str]` (default: `None`) | Token for accessing private models on Hugging Face. |
| `num_labels` | `Optional[int]` (default: `None`) | Number of output labels (used for classification tasks). |

##### Supported Transformer Types (`TRANSFORMER_TYPES`)

| **Model Type** | **Task** |
| ------------------------------------ | ------------------------ |
| `AutoModelForSequenceClassification` | Text classification |
| `AutoModelForTokenClassification` | Named entity recognition |
| `AutoModelForQuestionAnswering` | Question answering |
| `AutoModelForCausalLM` | Text generation (causal) |
| `AutoModelForMaskedLM` | Masked language modeling |
| `AutoModelForImageClassification` | Image classification |

#### Example Usage

```python
from transformers import AutoModelForSequenceClassification
from kubeflow.storage_initializer.hugging_face import HuggingFaceModelParams

params = HuggingFaceModelParams(
model_uri="bert-base-uncased",
transformer_type=AutoModelForSequenceClassification,
access_token="huggingface_access_token",
num_labels=2 # For binary classification
)
```

### HuggingFaceDatasetParams

#### Description

The `HuggingFaceDatasetParams` class holds configuration parameters for loading datasets from Hugging Face with validation checks.

| **Attribute** | **Type** | **Description** |
| -------------- | --------------------------------- | ------------------------------------------------------------------------- |
| `repo_id` | `str` | Identifier of the dataset repository on Hugging Face (must not be empty). |
| `access_token` | `Optional[str]` (default: `None`) | Token for accessing private datasets on Hugging Face. |
| `split` | `Optional[str]` (default: `None`) | Dataset split to load (e.g., `"train"`, `"test"`). |

#### Example Usage

```python
from kubeflow.storage_initializer.hugging_face import HuggingFaceDatasetParams

dataset_params = HuggingFaceDatasetParams(
repo_id="imdb", # Public dataset repository ID on Hugging Face
split="train", # Dataset split to load
access_token=None # Not needed for public datasets
)
```

### HuggingFaceTrainerParams

#### Description

The `HuggingFaceTrainerParams` class is used to define parameters for the training process in the Hugging Face framework. It includes the training arguments and LoRA configuration to optimize model training.

| **Parameter** | **Type** | **Description** |
| --------------------- | -------------------------------- | ----------------------------------------------------------------------------- |
| `training_parameters` | `transformers.TrainingArguments` | Contains the training arguments like learning rate, epochs, batch size, etc. |
| `lora_config` | `LoraConfig` | LoRA configuration to reduce the number of trainable parameters in the model. |

#### Example Usage

```python
from transformers import TrainingArguments
from peft import LoraConfig
from kubeflow.storage_initializer.hugging_face import HuggingFaceTrainerParams

trainer_params = HuggingFaceTrainerParams(
training_parameters=TrainingArguments(
output_dir="results",
learning_rate=2e-5,
num_train_epochs=3,
per_device_train_batch_size=8,
),
lora_config=LoraConfig(
r=8,
lora_alpha=16,
lora_dropout=0.1,
bias="none",
),
)
```

## Using custom images with Fine-Tuning API

Platform engineers can customize the storage initializer and trainer images by setting the `STORAGE_INITIALIZER_IMAGE` and `TRAINER_TRANSFORMER_IMAGE` environment variables before executing the `train` command.
Expand Down