Skip to content

katib: Update LLM HP tuning guide to clarify tunable fields and fix resource section #4067

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: master
Choose a base branch
from
132 changes: 21 additions & 111 deletions content/en/docs/components/katib/user-guides/llm-hp-optimization.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,81 +40,25 @@ from kubeflow.storage_initializer.hugging_face import (
)
```

#### HuggingFaceModelParams

##### Description

The `HuggingFaceModelParams` dataclass holds configuration parameters for initializing Hugging Face models with validation checks.

| **Attribute** | **Type** | **Description** |
| ------------------ | --------------------------------- | ---------------------------------------------------------- |
| `model_uri` | `str` | URI or path to the Hugging Face model (must not be empty). |
| `transformer_type` | `TRANSFORMER_TYPES` | Specifies the model type for various NLP/ML tasks. |
| `access_token` | `Optional[str]` (default: `None`) | Token for accessing private models on Hugging Face. |
| `num_labels` | `Optional[int]` (default: `None`) | Number of output labels (used for classification tasks). |

###### Supported Transformer Types (`TRANSFORMER_TYPES`)

| **Model Type** | **Task** |
| ------------------------------------ | ------------------------ |
| `AutoModelForSequenceClassification` | Text classification |
| `AutoModelForTokenClassification` | Named entity recognition |
| `AutoModelForQuestionAnswering` | Question answering |
| `AutoModelForCausalLM` | Text generation (causal) |
| `AutoModelForMaskedLM` | Masked language modeling |
| `AutoModelForImageClassification` | Image classification |

##### Example Usage

```python
from transformers import AutoModelForSequenceClassification

from kubeflow.storage_initializer.hugging_face import HuggingFaceModelParams
{{% alert title="Note" color="info" %}}
The detailed descriptions of these parameter classes have been moved to the [Training Operator fine-tuning guide](/docs/components/trainer/legacy-v1/user-guides/fine-tuning/#dataset-and-model-parameter-classes). This page provides a brief overview of how to use these classes with Katib for hyperparameter optimization.
{{% /alert %}}

#### HuggingFaceModelParams

params = HuggingFaceModelParams(
model_uri="bert-base-uncased",
transformer_type=AutoModelForSequenceClassification,
access_token="huggingface_access_token",
num_labels=2 # For binary classification
)
```
The `HuggingFaceModelParams` dataclass holds configuration parameters for initializing Hugging Face models. For detailed documentation and examples, see the [Training Operator fine-tuning guide](/docs/components/trainer/legacy-v1/user-guides/fine-tuning/#huggingfacemodelparams).

#### HuggingFaceDatasetParams

##### Description

The `HuggingFaceDatasetParams` class holds configuration parameters for loading datasets from Hugging Face with validation checks.

| **Attribute** | **Type** | **Description** |
| -------------- | --------------------------------- | ------------------------------------------------------------------------- |
| `repo_id` | `str` | Identifier of the dataset repository on Hugging Face (must not be empty). |
| `access_token` | `Optional[str]` (default: `None`) | Token for accessing private datasets on Hugging Face. |
| `split` | `Optional[str]` (default: `None`) | Dataset split to load (e.g., `"train"`, `"test"`). |

##### Example Usage

```python
from kubeflow.storage_initializer.hugging_face import HuggingFaceDatasetParams


dataset_params = HuggingFaceDatasetParams(
repo_id="imdb", # Public dataset repository ID on Hugging Face
split="train", # Dataset split to load
access_token=None # Not needed for public datasets
)
```
The `HuggingFaceDatasetParams` class holds configuration parameters for loading datasets from Hugging Face. For detailed documentation and examples, see the [Training Operator fine-tuning guide](/docs/components/trainer/legacy-v1/user-guides/fine-tuning/#huggingfacedatasetparams).

#### HuggingFaceTrainerParams

##### Description

The `HuggingFaceTrainerParams` class is used to define parameters for the training process in the Hugging Face framework. It includes the training arguments and LoRA configuration to optimize model training.
The `HuggingFaceTrainerParams` class is used to define parameters for the training process in the Hugging Face framework. For detailed documentation and examples, see the [Training Operator fine-tuning guide](/docs/components/trainer/legacy-v1/user-guides/fine-tuning/#huggingfacetrainerparams).

| **Parameter** | **Type** | **Description** |
| --------------------- | -------------------------------- | ----------------------------------------------------------------------------- |
| `training_parameters` | `transformers.TrainingArguments` | Contains the training arguments like learning rate, epochs, batch size, etc. |
| `lora_config` | `LoraConfig` | LoRA configuration to reduce the number of trainable parameters in the model. |
{{% alert title="Note" color="info" %}}
Currently, only parameters within `training_parameters` and `lora_config` can be tuned using Katib's search API. Other fields are static and cannot be tuned.
{{% /alert %}}

###### Katib Search API for Defining Hyperparameter Search Space

Expand Down Expand Up @@ -159,42 +103,11 @@ trainer_params = HuggingFaceTrainerParams(

### S3-Compatible Object Storage Integration

In addition to Hugging Face, you can integrate with S3-compatible object storage platforms to load datasets. To work with S3, use the `S3DatasetParams` class to define your dataset parameters.

```python
from kubeflow.storage_initializer.s3 import S3DatasetParams
```

#### S3DatasetParams

##### Description
In addition to Hugging Face, you can integrate with S3-compatible object storage platforms to
load datasets. To work with S3, use the `S3DatasetParams` class to define your dataset
parameters.

The `S3DatasetParams` class is used for loading datasets from S3-compatible object storage. The parameters are defined as follows:

| **Parameter** | **Type** | **Description** |
| -------------- | --------------- | ----------------------------------------------------- |
| `endpoint_url` | `str` | URL of the S3-compatible storage service. |
| `bucket_name` | `str` | Name of the S3 bucket containing the dataset. |
| `file_key` | `str` | Key (path) to the dataset file within the bucket. |
| `region_name` | `str`, optional | The AWS region of the S3 bucket (optional). |
| `access_key` | `str`, optional | The access key for authentication with S3 (optional). |
| `secret_key` | `str`, optional | The secret key for authentication with S3 (optional). |

##### Example Usage

```python
from kubeflow.storage_initializer.s3 import S3DatasetParams


s3_params = S3DatasetParams(
endpoint_url="https://s3.amazonaws.com",
bucket_name="my-dataset-bucket",
file_key="datasets/train.csv",
region_name="us-west-2",
access_key="YOUR_ACCESS_KEY",
secret_key="YOUR_SECRET_KEY"
)
```
For loading datasets from S3-compatible object storage, see the [S3DatasetParams documentation](/docs/components/trainer/legacy-v1/user-guides/fine-tuning/#s3datasetparams) in the Training Operator fine-tuning guide.

## Optimizing Hyperparameters of Large Language Models

Expand All @@ -209,9 +122,6 @@ In the context of optimizing hyperparameters of large language models (LLMs) lik
| `dataset_provider_parameters` | Parameters for the dataset provider, such as dataset configuration. | Optional |
| `trainer_parameters` | Configuration for the trainer, including hyperparameters for model training. | Optional |
| `storage_config` | Configuration for storage, like PVC size and storage class. | Optional |
| `objective` | Objective function for training and optimization. | Optional |
| `base_image` | Base image for executing the objective function. | Optional |
| `parameters` | Hyperparameters for tuning the experiment. | Optional |
| `namespace` | Kubernetes namespace for the experiment. | Optional |
| `env_per_trial` | Environment variables for each trial. | Optional |
| `algorithm_name` | Algorithm used for the hyperparameter search. | Optional |
Expand Down Expand Up @@ -243,7 +153,7 @@ In the context of optimizing hyperparameters of large language models (LLMs) lik
resources_per_trial=katib.TrainerResources(
num_workers=1,
num_procs_per_worker=1,
resources_per_worker={"gpu": 0, "cpu": 1, "memory": "10G",},
resources_per_worker={"gpu": 0, "cpu": 1, "memory": "10G"},
)
```

Expand Down Expand Up @@ -330,15 +240,15 @@ cl.tune(
algorithm_name = "random",
max_trial_count = 10,
parallel_trial_count = 2,
resources_per_trial={
"gpu": "2",
"cpu": "4",
"memory": "10G",
},
resources_per_trial=katib.TrainerResources(
num_workers=2,
num_procs_per_worker=2,
resources_per_worker={"gpu": 2, "cpu": 4, "memory": "10G"},
),
)

cl.wait_for_experiment_condition(name=exp_name)

# Get the best hyperparameters.
print(cl.get_optimal_hyperparameters(exp_name))
```
```
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,138 @@ TrainingClient().train(
After you execute `train`, the Training Operator will orchestrate the appropriate PyTorchJob resources
to fine-tune the LLM.

## Dataset and Model Parameter Classes

### HuggingFaceModelParams

#### Description

The `HuggingFaceModelParams` dataclass holds configuration parameters for initializing Hugging Face models with validation checks.

| **Attribute** | **Type** | **Description** |
| ------------------ | --------------------------------- | ---------------------------------------------------------- |
| `model_uri` | `str` | URI or path to the Hugging Face model (must not be empty). |
| `transformer_type` | `TRANSFORMER_TYPES` | Specifies the model type for various NLP/ML tasks. |
| `access_token` | `Optional[str]` (default: `None`) | Token for accessing private models on Hugging Face. |
| `num_labels` | `Optional[int]` (default: `None`) | Number of output labels (used for classification tasks). |

##### Supported Transformer Types (`TRANSFORMER_TYPES`)

| **Model Type** | **Task** |
| ------------------------------------ | ------------------------ |
| `AutoModelForSequenceClassification` | Text classification |
| `AutoModelForTokenClassification` | Named entity recognition |
| `AutoModelForQuestionAnswering` | Question answering |
| `AutoModelForCausalLM` | Text generation (causal) |
| `AutoModelForMaskedLM` | Masked language modeling |
| `AutoModelForImageClassification` | Image classification |

#### Example Usage

```python
from transformers import AutoModelForSequenceClassification
from kubeflow.storage_initializer.hugging_face import HuggingFaceModelParams

params = HuggingFaceModelParams(
model_uri="bert-base-uncased",
transformer_type=AutoModelForSequenceClassification,
access_token="huggingface_access_token",
num_labels=2 # For binary classification
)
```

### HuggingFaceDatasetParams

#### Description

The `HuggingFaceDatasetParams` class holds configuration parameters for loading datasets from Hugging Face with validation checks.

| **Attribute** | **Type** | **Description** |
| -------------- | --------------------------------- | ------------------------------------------------------------------------- |
| `repo_id` | `str` | Identifier of the dataset repository on Hugging Face (must not be empty). |
| `access_token` | `Optional[str]` (default: `None`) | Token for accessing private datasets on Hugging Face. |
| `split` | `Optional[str]` (default: `None`) | Dataset split to load (e.g., `"train"`, `"test"`). |

#### Example Usage

```python
from kubeflow.storage_initializer.hugging_face import HuggingFaceDatasetParams

dataset_params = HuggingFaceDatasetParams(
repo_id="imdb", # Public dataset repository ID on Hugging Face
split="train", # Dataset split to load
access_token=None # Not needed for public datasets
)
```

### HuggingFaceTrainerParams

#### Description

The `HuggingFaceTrainerParams` class is used to define parameters for the training process in the Hugging Face framework. It includes the training arguments and LoRA configuration to optimize model training.

| **Parameter** | **Type** | **Description** |
| --------------------- | -------------------------------- | ----------------------------------------------------------------------------- |
| `training_parameters` | `transformers.TrainingArguments` | Contains the training arguments like learning rate, epochs, batch size, etc. |
| `lora_config` | `LoraConfig` | LoRA configuration to reduce the number of trainable parameters in the model. |

#### Example Usage

```python
from transformers import TrainingArguments
from peft import LoraConfig
from kubeflow.storage_initializer.hugging_face import HuggingFaceTrainerParams

trainer_params = HuggingFaceTrainerParams(
training_parameters=TrainingArguments(
output_dir="results",
learning_rate=2e-5,
num_train_epochs=3,
per_device_train_batch_size=8,
),
lora_config=LoraConfig(
r=8,
lora_alpha=16,
lora_dropout=0.1,
bias="none",
),
)
```

### S3DatasetParams

#### Description

The `S3DatasetParams` class is used for loading datasets from S3-compatible object storage. It includes validation checks to ensure proper configuration.

| **Parameter** | **Type** | **Description** |
| -------------- | --------------- | ----------------------------------------------------- |
| `endpoint_url` | `str` | URL of the S3-compatible storage service. |
| `bucket_name` | `str` | Name of the S3 bucket containing the dataset. |
| `file_key` | `str` | Key (path) to the dataset file within the bucket. |
| `region_name` | `str`, optional | The AWS region of the S3 bucket (optional). |
| `access_key` | `str`, optional | The access key for authentication with S3 (optional). |
| `secret_key` | `str`, optional | The secret key for authentication with S3 (optional). |

#### Implementation Details

The `S3DatasetParams` class includes validation checks to ensure required parameters are provided and the endpoint URL is valid. The actual dataset download is handled by the `S3` class which uses boto3 to interact with the S3-compatible storage.

#### Example Usage

```python
from kubeflow.storage_initializer.s3 import S3DatasetParams

s3_params = S3DatasetParams(
endpoint_url="https://s3.amazonaws.com",
bucket_name="my-dataset-bucket",
file_key="datasets/train.csv",
region_name="us-west-2",
access_key="YOUR_ACCESS_KEY",
secret_key="YOUR_SECRET_KEY"
)
```

## Using custom images with Fine-Tuning API

Platform engineers can customize the storage initializer and trainer images by setting the `STORAGE_INITIALIZER_IMAGE` and `TRAINER_TRANSFORMER_IMAGE` environment variables before executing the `train` command.
Expand Down