kubeflow · SanthoshToorpu · Mar 29, 2025 · Mar 30, 2025 · Mar 30, 2025 · Mar 30, 2025
diff --git a/content/en/docs/components/katib/user-guides/llm-hp-optimization.md b/content/en/docs/components/katib/user-guides/llm-hp-optimization.md
@@ -40,81 +40,25 @@ from kubeflow.storage_initializer.hugging_face import (
 )
 ```
 
-#### HuggingFaceModelParams
-
-##### Description
-
-The `HuggingFaceModelParams` dataclass holds configuration parameters for initializing Hugging Face models with validation checks.
-
-| **Attribute**      | **Type**                          | **Description**                                            |
-| ------------------ | --------------------------------- | ---------------------------------------------------------- |
-| `model_uri`        | `str`                             | URI or path to the Hugging Face model (must not be empty). |
-| `transformer_type` | `TRANSFORMER_TYPES`               | Specifies the model type for various NLP/ML tasks.         |
-| `access_token`     | `Optional[str]` (default: `None`) | Token for accessing private models on Hugging Face.        |
-| `num_labels`       | `Optional[int]` (default: `None`) | Number of output labels (used for classification tasks).   |
-
-###### Supported Transformer Types (`TRANSFORMER_TYPES`)
-
-| **Model Type**                       | **Task**                 |
-| ------------------------------------ | ------------------------ |
-| `AutoModelForSequenceClassification` | Text classification      |
-| `AutoModelForTokenClassification`    | Named entity recognition |
-| `AutoModelForQuestionAnswering`      | Question answering       |
-| `AutoModelForCausalLM`               | Text generation (causal) |
-| `AutoModelForMaskedLM`               | Masked language modeling |
-| `AutoModelForImageClassification`    | Image classification     |
-
-##### Example Usage
-
-```python
-from transformers import AutoModelForSequenceClassification
-
-from kubeflow.storage_initializer.hugging_face import HuggingFaceModelParams
+{{% alert title="Note" color="info" %}}
+The detailed descriptions of these parameter classes have been moved to the [Training Operator fine-tuning guide](/docs/components/trainer/legacy-v1/user-guides/fine-tuning/#dataset-and-model-parameter-classes). This page provides a brief overview of how to use these classes with Katib for hyperparameter optimization.
+{{% /alert %}}
 
+#### HuggingFaceModelParams
 
-params = HuggingFaceModelParams(
-    model_uri="bert-base-uncased",
-    transformer_type=AutoModelForSequenceClassification,
-    access_token="huggingface_access_token",
-    num_labels=2  # For binary classification
-)
-```
+The `HuggingFaceModelParams` dataclass holds configuration parameters for initializing Hugging Face models. For detailed documentation and examples, see the [Training Operator fine-tuning guide](/docs/components/trainer/legacy-v1/user-guides/fine-tuning/#huggingfacemodelparams).
 
 #### HuggingFaceDatasetParams
 
-##### Description
-
-The `HuggingFaceDatasetParams` class holds configuration parameters for loading datasets from Hugging Face with validation checks.
-
-| **Attribute**  | **Type**                          | **Description**                                                           |
-| -------------- | --------------------------------- | ------------------------------------------------------------------------- |
-| `repo_id`      | `str`                             | Identifier of the dataset repository on Hugging Face (must not be empty). |
-| `access_token` | `Optional[str]` (default: `None`) | Token for accessing private datasets on Hugging Face.                     |
-| `split`        | `Optional[str]` (default: `None`) | Dataset split to load (e.g., `"train"`, `"test"`).                        |
-
-##### Example Usage
-
-```python
-from kubeflow.storage_initializer.hugging_face import HuggingFaceDatasetParams
-
-
-dataset_params = HuggingFaceDatasetParams(
-    repo_id="imdb",            # Public dataset repository ID on Hugging Face
-    split="train",             # Dataset split to load
-    access_token=None          # Not needed for public datasets
-)
-```
+The `HuggingFaceDatasetParams` class holds configuration parameters for loading datasets from Hugging Face. For detailed documentation and examples, see the [Training Operator fine-tuning guide](/docs/components/trainer/legacy-v1/user-guides/fine-tuning/#huggingfacedatasetparams).
 
 #### HuggingFaceTrainerParams
 
-##### Description
-
-The `HuggingFaceTrainerParams` class is used to define parameters for the training process in the Hugging Face framework. It includes the training arguments and LoRA configuration to optimize model training.
+The `HuggingFaceTrainerParams` class is used to define parameters for the training process in the Hugging Face framework. For detailed documentation and examples, see the [Training Operator fine-tuning guide](/docs/components/trainer/legacy-v1/user-guides/fine-tuning/#huggingfacetrainerparams).
 
-| **Parameter**         | **Type**                         | **Description**                                                               |
-| --------------------- | -------------------------------- | ----------------------------------------------------------------------------- |
-| `training_parameters` | `transformers.TrainingArguments` | Contains the training arguments like learning rate, epochs, batch size, etc.  |
-| `lora_config`         | `LoraConfig`                     | LoRA configuration to reduce the number of trainable parameters in the model. |
+{{% alert title="Note" color="info" %}}
+Currently, only parameters within `training_parameters` and `lora_config` can be tuned using Katib's search API. Other fields are static and cannot be tuned.
+{{% /alert %}}
 
 ###### Katib Search API for Defining Hyperparameter Search Space
 
@@ -159,42 +103,11 @@ trainer_params = HuggingFaceTrainerParams(
 
 ### S3-Compatible Object Storage Integration
 
-In addition to Hugging Face, you can integrate with S3-compatible object storage platforms to load datasets. To work with S3, use the `S3DatasetParams` class to define your dataset parameters.
-
-```python
-from kubeflow.storage_initializer.s3 import S3DatasetParams
-```
-
-#### S3DatasetParams
-
-##### Description
+In addition to Hugging Face, you can integrate with S3-compatible object storage platforms to 
+load datasets. To work with S3, use the `S3DatasetParams` class to define your dataset 
+parameters.
 
-The `S3DatasetParams` class is used for loading datasets from S3-compatible object storage. The parameters are defined as follows:
-
-| **Parameter**  | **Type**        | **Description**                                       |
-| -------------- | --------------- | ----------------------------------------------------- |
-| `endpoint_url` | `str`           | URL of the S3-compatible storage service.             |
-| `bucket_name`  | `str`           | Name of the S3 bucket containing the dataset.         |
-| `file_key`     | `str`           | Key (path) to the dataset file within the bucket.     |
-| `region_name`  | `str`, optional | The AWS region of the S3 bucket (optional).           |
-| `access_key`   | `str`, optional | The access key for authentication with S3 (optional). |
-| `secret_key`   | `str`, optional | The secret key for authentication with S3 (optional). |
-
-##### Example Usage
-
-```python
-from kubeflow.storage_initializer.s3 import S3DatasetParams
-
-
-s3_params = S3DatasetParams(
-    endpoint_url="https://s3.amazonaws.com",
-    bucket_name="my-dataset-bucket",
-    file_key="datasets/train.csv",
-    region_name="us-west-2",
-    access_key="YOUR_ACCESS_KEY",
-    secret_key="YOUR_SECRET_KEY"
-)
-```
+For loading datasets from S3-compatible object storage, see the [S3DatasetParams documentation](/docs/components/trainer/legacy-v1/user-guides/fine-tuning/#s3datasetparams) in the Training Operator fine-tuning guide.
 
 ## Optimizing Hyperparameters of Large Language Models
 
@@ -209,9 +122,6 @@ In the context of optimizing hyperparameters of large language models (LLMs) lik
 | `dataset_provider_parameters` | Parameters for the dataset provider, such as dataset configuration.          | Optional     |
 | `trainer_parameters`          | Configuration for the trainer, including hyperparameters for model training. | Optional     |
 | `storage_config`              | Configuration for storage, like PVC size and storage class.                  | Optional     |
-| `objective`                   | Objective function for training and optimization.                            | Optional     |
-| `base_image`                  | Base image for executing the objective function.                             | Optional     |
-| `parameters`                  | Hyperparameters for tuning the experiment.                                   | Optional     |
 | `namespace`                   | Kubernetes namespace for the experiment.                                     | Optional     |
 | `env_per_trial`               | Environment variables for each trial.                                        | Optional     |
 | `algorithm_name`              | Algorithm used for the hyperparameter search.                                | Optional     |
@@ -243,7 +153,7 @@ In the context of optimizing hyperparameters of large language models (LLMs) lik
      resources_per_trial=katib.TrainerResources(
         num_workers=1,
         num_procs_per_worker=1,
-        resources_per_worker={"gpu": 0, "cpu": 1, "memory": "10G",},
+        resources_per_worker={"gpu": 0, "cpu": 1, "memory": "10G"},
      )
    ```
 
@@ -330,15 +240,15 @@ cl.tune(
 	algorithm_name = "random",
 	max_trial_count = 10,
 	parallel_trial_count = 2,
-	resources_per_trial={
-		"gpu": "2",
-		"cpu": "4",
-		"memory": "10G",
-	},
+	resources_per_trial=katib.TrainerResources(
+		num_workers=2,
+		num_procs_per_worker=2,
+		resources_per_worker={"gpu": 2, "cpu": 4, "memory": "10G"},
+	),
 )
 
 cl.wait_for_experiment_condition(name=exp_name)
 
 # Get the best hyperparameters.
 print(cl.get_optimal_hyperparameters(exp_name))
-```
+```
diff --git a/content/en/docs/components/trainer/legacy-v1/user-guides/fine-tuning.md b/content/en/docs/components/trainer/legacy-v1/user-guides/fine-tuning.md
@@ -88,6 +88,138 @@ TrainingClient().train(
 After you execute `train`, the Training Operator will orchestrate the appropriate PyTorchJob resources
 to fine-tune the LLM.
 
+## Dataset and Model Parameter Classes
+
+### HuggingFaceModelParams
+
+#### Description
+
+The `HuggingFaceModelParams` dataclass holds configuration parameters for initializing Hugging Face models with validation checks.
+
+| **Attribute**      | **Type**                          | **Description**                                            |
+| ------------------ | --------------------------------- | ---------------------------------------------------------- |
+| `model_uri`        | `str`                             | URI or path to the Hugging Face model (must not be empty). |
+| `transformer_type` | `TRANSFORMER_TYPES`               | Specifies the model type for various NLP/ML tasks.         |
+| `access_token`     | `Optional[str]` (default: `None`) | Token for accessing private models on Hugging Face.        |
+| `num_labels`       | `Optional[int]` (default: `None`) | Number of output labels (used for classification tasks).   |
+
+##### Supported Transformer Types (`TRANSFORMER_TYPES`)
+
+| **Model Type**                       | **Task**                 |
+| ------------------------------------ | ------------------------ |
+| `AutoModelForSequenceClassification` | Text classification      |
+| `AutoModelForTokenClassification`    | Named entity recognition |
+| `AutoModelForQuestionAnswering`      | Question answering       |
+| `AutoModelForCausalLM`               | Text generation (causal) |
+| `AutoModelForMaskedLM`               | Masked language modeling |
+| `AutoModelForImageClassification`    | Image classification     |
+
+#### Example Usage
+
+```python
+from transformers import AutoModelForSequenceClassification
+from kubeflow.storage_initializer.hugging_face import HuggingFaceModelParams
+
+params = HuggingFaceModelParams(
+    model_uri="bert-base-uncased",
+    transformer_type=AutoModelForSequenceClassification,
+    access_token="huggingface_access_token",
+    num_labels=2  # For binary classification
+)
+```
+
+### HuggingFaceDatasetParams
+
+#### Description
+
+The `HuggingFaceDatasetParams` class holds configuration parameters for loading datasets from Hugging Face with validation checks.
+
+| **Attribute**  | **Type**                          | **Description**                                                           |
+| -------------- | --------------------------------- | ------------------------------------------------------------------------- |
+| `repo_id`      | `str`                             | Identifier of the dataset repository on Hugging Face (must not be empty). |
+| `access_token` | `Optional[str]` (default: `None`) | Token for accessing private datasets on Hugging Face.                     |
+| `split`        | `Optional[str]` (default: `None`) | Dataset split to load (e.g., `"train"`, `"test"`).                        |
+
+#### Example Usage
+
+```python
+from kubeflow.storage_initializer.hugging_face import HuggingFaceDatasetParams
+
+dataset_params = HuggingFaceDatasetParams(
+    repo_id="imdb",            # Public dataset repository ID on Hugging Face
+    split="train",             # Dataset split to load
+    access_token=None          # Not needed for public datasets
+)
+```
+
+### HuggingFaceTrainerParams
+
+#### Description
+
+The `HuggingFaceTrainerParams` class is used to define parameters for the training process in the Hugging Face framework. It includes the training arguments and LoRA configuration to optimize model training.
+
+| **Parameter**         | **Type**                         | **Description**                                                               |
+| --------------------- | -------------------------------- | ----------------------------------------------------------------------------- |
+| `training_parameters` | `transformers.TrainingArguments` | Contains the training arguments like learning rate, epochs, batch size, etc.  |
+| `lora_config`         | `LoraConfig`                     | LoRA configuration to reduce the number of trainable parameters in the model. |
+
+#### Example Usage
+
+```python
+from transformers import TrainingArguments
+from peft import LoraConfig
+from kubeflow.storage_initializer.hugging_face import HuggingFaceTrainerParams
+
+trainer_params = HuggingFaceTrainerParams(
+    training_parameters=TrainingArguments(
+        output_dir="results",
+        learning_rate=2e-5,
+        num_train_epochs=3,
+        per_device_train_batch_size=8,
+    ),
+    lora_config=LoraConfig(
+        r=8,
+        lora_alpha=16,
+        lora_dropout=0.1,
+        bias="none",
+    ),
+)
+```
+
+### S3DatasetParams
+
+#### Description
+
+The `S3DatasetParams` class is used for loading datasets from S3-compatible object storage. It includes validation checks to ensure proper configuration.
+
+| **Parameter**  | **Type**        | **Description**                                       |
+| -------------- | --------------- | ----------------------------------------------------- |
+| `endpoint_url` | `str`           | URL of the S3-compatible storage service.             |
+| `bucket_name`  | `str`           | Name of the S3 bucket containing the dataset.         |
+| `file_key`     | `str`           | Key (path) to the dataset file within the bucket.     |
+| `region_name`  | `str`, optional | The AWS region of the S3 bucket (optional).           |
+| `access_key`   | `str`, optional | The access key for authentication with S3 (optional). |
+| `secret_key`   | `str`, optional | The secret key for authentication with S3 (optional). |
+
+#### Implementation Details
+
+The `S3DatasetParams` class includes validation checks to ensure required parameters are provided and the endpoint URL is valid. The actual dataset download is handled by the `S3` class which uses boto3 to interact with the S3-compatible storage.
+
+#### Example Usage
+
+```python
+from kubeflow.storage_initializer.s3 import S3DatasetParams
+
+s3_params = S3DatasetParams(
+    endpoint_url="https://s3.amazonaws.com",
+    bucket_name="my-dataset-bucket",
+    file_key="datasets/train.csv",
+    region_name="us-west-2",
+    access_key="YOUR_ACCESS_KEY",
+    secret_key="YOUR_SECRET_KEY"
+)
+```
+
 ## Using custom images with Fine-Tuning API
 
 Platform engineers can customize the storage initializer and trainer images by setting the `STORAGE_INITIALIZER_IMAGE` and `TRAINER_TRANSFORMER_IMAGE` environment variables before executing the `train` command.