docs: refine integrate with jina (#492)

bwanglzu · web-flow · commit ea906171c9fb · 2022-07-15T14:33:56.000+02:00
* docs: refine integrate with jina

* docs: tweek words

* docs: refine structure of jina integration

* docs: create 3 tabs

* docs: add volume mount

* docs: upgrade version

* docs: add embed with docarray

* docs: refine comments add changelog

* chore: bump docarray to 0.13.31

* docs: use mnt add output

* docs: tweek words and refer docarray

* docs: print run artifact id to output

* chore: add changelog

* docs: more text descriptions on artifact and zip

* docs: restructure usage

* docs: restructure usage

* docs: fix output shape

* docs: fix integration docarray host

* docs: tweek words

* docs: remove label for clip training
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -22,6 +22,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 - Bump `jina-hubble-sdk` to 0.8.1. ([#488](https://github.com/jina-ai/finetuner/pull/488))
 
+- Improve integration section in documentation. ([#492](https://github.com/jina-ai/finetuner/pull/492))
+
+- Bump `docarray` to 0.13.31. ([#492](https://github.com/jina-ai/finetuner/pull/492))
+
 ### Fixed
 
 - Use `uri` to represent image content in documentation creating training data code snippet. ([#484](https://github.com/jina-ai/finetuner/pull/484))
diff --git a/docs/walkthrough/create-training-data.md b/docs/walkthrough/create-training-data.md
@@ -49,12 +49,10 @@ train_da = DocumentArray([
             Document(
                 content='pencil skirt slim fit available for sell',
                 modality='text',
-                tags={'finetuner_label': 'skirt-1'}
             ),
             Document(
                 uri='https://...skirt-1.png',
                 modality='image',
-                tags={'finetuner_label': 'skirt-1'}
             ),
         ],
     ),
@@ -63,12 +61,10 @@ train_da = DocumentArray([
             Document(
                 content='stripped over-sized shirt for sell',
                 modality='text',
-                tags={'finetuner_label': 'shirt-1'}
             ),
             Document(
                 uri='https://...shirt-1.png',
                 modality='image',
-                tags={'finetuner_label': 'shirt-1'}
             ),
         ],
     ),
diff --git a/docs/walkthrough/integrate-with-jina.md b/docs/walkthrough/integrate-with-jina.md
@@ -1,5 +1,7 @@
+# Integration
+
 (integrate-with-jina)=
-# Integrate with Jina
+## Fine-tuned model as Executor
 
 Once fine-tuning is finished, it's time to actually use the model. 
 Finetuner, being part of the Jina ecosystem, provides a convenient way to use tuned models via [Jina Executors](https://docs.jina.ai/fundamentals/executor/).
@@ -8,14 +10,35 @@ We've created the [`FinetunerExecutor`](https://hub.jina.ai/executor/13dzxycc) w
 More specifically, the executor exposes an `/encode` endpoint that embeds [Documents](https://docarray.jina.ai/fundamentals/document/) using the fine-tuned model.
 
 Loading a tuned model is simple! You just need to provide a few parameters under the `uses_with` argument when adding the `FinetunerExecutor` to the [Flow]((https://docs.jina.ai/fundamentals/flow/)).
+You have three options:
+
+````{tab} Artifact id and token
+```python
+import finetuner
+from jina import Flow
+
+finetuner.login()
 
-````{tab} Python
+token = finetuner.get_token()
+run = finetuner.get_run(
+    experiment_name='YOUR-EXPERIMENT',
+    run_name='YOUR-RUN'
+)
+	
+f = Flow().add(
+    uses='jinahub+docker://FinetunerExecutor/v0.9.2',  # use v0.9.2-gpu for gpu executor.
+    uses_with={'artifact': run.artifact_id, 'token': token},
+)
+```
+````
+````{tab} Locally saved artifact
 ```python
 from jina import Flow
 	
 f = Flow().add(
-    uses='jinahub+docker://FinetunerExecutor',
-    uses_with={'artifact': 'model_dir/tuned_model', 'batch_size': 16},
+    uses='jinahub+docker://FinetunerExecutor/v0.9.2',  # use v0.9.2-gpu for gpu executor.
+    uses_with={'artifact': '/mnt/YOUR-MODEL.zip'},
+    volumes=['/your/local/path/:/mnt']  # mount your model path to docker.
 )
 ```
 ````
@@ -26,19 +49,48 @@ with:
   port: 51000
   protocol: grpc
 executors:
-  uses: jinahub+docker://FinetunerExecutor
+  uses: jinahub+docker://FinetunerExecutor/v0.9.2
   with:
-    artifact: 'model_dir/tuned_model'
-    batch_size: 16
+    artifact: 'COPY-YOUR-ARTIFACT-ID-HERE'
+    token: 'COPY-YOUR-TOKEN-HERE'  # or better set as env
 ```
 ````
-```{admonition} FinetunerExecutor via source code
-:class: tip
-You can also use the `FinetunerExecutor` via source code by specifying `jinahub://FinetunerExecutor` under the `uses` parameter.
-However, using docker images is recommended.
+
+As you can see, it's super easy! 
+If you did not call `save_artifact`,
+you need to provide the `artifact_id` and `token`.
+`FinetunerExecutor` will automatically pull your model from the cloud storage to the container.
+
+On the other hand,
+if you have saved artifact locally,
+please mount the zipped artifact to the docker container.
+`FinetunerExecutor` will unzip the artifact and load models.
+
+You can start your flow with:
+
+```python
+with f:
+    # in this example, we fine-tuned a BERT model and embed a Document..
+    returned_docs = f.post(
+        on='/encode',
+        inputs=DocumentArray(
+            [
+                Document(
+                    text='some text to encode'
+                )
+            ]
+        )
+    )
+
+for doc in returned_docs:
+    print(f'Text of the returned document: {doc.text}')
+    print(f'Shape of the embedding: {doc.embedding.shape}')
 ```
 
-As you can see, it's super easy! We just provided the model path and the batch size.
+```console
+Text of the returned document: some text to encode
+Shape of the embedding: (768,)
+```
 
 In order to see what other options you can specify when initializing the executor, please go to the [`FinetunerExecutor`](https://hub.jina.ai/executor/13dzxycc) page and click on `Arguments` on the top-right side.
 
@@ -47,28 +99,47 @@ In order to see what other options you can specify when initializing the executo
 The only required argument is `artifact`. We provide default values for others.
 ```
 
+(integrate-with-docarray)=
+## Embed DocumentArray
 
-## Using `FinetunerExecutor`
-
-Here's a simple code snippet demonstrating the `FinetunerExecutor` usage in the Flow:
+Similarly, you can embed the [DocumentArray](https://docarray.jina.ai/) with fine-tuned model:
 
+````{tab} Artifact id and token
 ```python
 from docarray import DocumentArray, Document
-from jina import Flow
+import finetuner
 
-f = Flow().add(
-    uses='jinahub+docker://FinetunerExecutor',
-    uses_with={'artifact': 'model_dir/tuned_model', 'batch_size': 16},
+finetuner.login()
+
+token = finetuner.get_token()
+run = finetuner.get_run(
+    experiment_name='YOUR-EXPERIMENT',
+    run_name='YOUR-RUN'
 )
 
-with f:
-    returned_docs = f.post(on='/encode', inputs=DocumentArray([Document(text='hello')]))
+da = DocumentArray([Document(text='some text to encode')])
 
-for doc in returned_docs:
-    print(f'Text of the returned document: {doc.text}')
-    print(f'Shape of the embedding: {doc.embedding.shape}')
+da.post(
+    'jinahub+docker://FinetunerExecutor/v0.9.2/encode',
+    uses_with={'artifact': run.artifact_id, 'token': token},
+)
+```
+````
+````{tab} Locally saved artifact
+```python
+from docarray import DocumentArray, Document
+
+da = DocumentArray([Document(text='some text to encode')])
+
+da.post(
+    'jinahub+docker://FinetunerExecutor/v0.9.2/encode,
+    uses_with={'artifact': '/mnt/YOUR-MODEL.zip'},
+    volumes=['/your/local/path/:/mnt']  # mount your model path to docker.
+)
+```
+````
+
+```console
+Text of the returned document: some text to encode
+Shape of the embedding: (768,)
 ```
-```bash
-Text of the returned document: hello
-Shape of the embedding: (1, 768)
-```
diff --git a/docs/walkthrough/save-model.md b/docs/walkthrough/save-model.md
@@ -1,5 +1,5 @@
 (retrieve-tuned-model)=
-# Save Model
+# Save Artifact
 
 Perfect!
 Now you have started the fine-tuning job in the cloud.
@@ -17,8 +17,9 @@ experiment = finetuner.get_experiment('finetune-flickr-dataset')
 # connect to the run we created previously.
 run = experiment.get_run('finetune-flickr-dataset-efficientnet-1')
 print(f'Run status: {run.status()}')
+print(f'Run artifact id: {run.artifact_id}')
 print(f'Run logs: {run.logs()}')
-# save the model.
+# save the artifact.
 run.save_artifact('tuned_model')
 ```
 
@@ -28,6 +29,7 @@ you can see this in the terminal:
 ```bash
 🔐 Successfully login to Jina Ecosystem!
 Run status: FINISHED
+Run Artifact id: 62972acb5de25a53fdbfcecc
 Run logs:
 
   Training [2/2] ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50/50 0:00:00 0:01:08 • loss: 0.050
diff --git a/requirements.txt b/requirements.txt
@@ -1,4 +1,4 @@
-docarray[common]>=0.13.19
+docarray[common]>=0.13.31
 jina-hubble-sdk>=0.8.1
 requests>=2.27.1
 rich>=12.4.4