|
1 | 1 | # Tools
|
2 | 2 |
|
| 3 | +LMDeploy supports tools for InternLM2, InternLM2.5 and llama3.1 models. |
| 4 | + |
3 | 5 | ## Single Round Invocation
|
4 | 6 |
|
5 |
| -Currently, LMDeploy supports tools only for InternLM2, InternLM2.5 and llama3.1 models. Please start the service of models before running the following example. |
| 7 | +Please start the service of models before running the following example. |
6 | 8 |
|
7 | 9 | ```python
|
8 | 10 | from openai import OpenAI
|
@@ -43,7 +45,7 @@ print(response)
|
43 | 45 |
|
44 | 46 | ## Multiple Round Invocation
|
45 | 47 |
|
46 |
| -### InternLM demo |
| 48 | +### InternLM |
47 | 49 |
|
48 | 50 | A complete toolchain invocation process can be demonstrated through the following example.
|
49 | 51 |
|
@@ -149,58 +151,96 @@ ChatCompletion(id='2', choices=[Choice(finish_reason='tool_calls', index=0, logp
|
149 | 151 | 16
|
150 | 152 | ```
|
151 | 153 |
|
152 |
| -### Llama3.1 demo |
| 154 | +### Llama 3.1 |
153 | 155 |
|
154 |
| -```python |
155 |
| -from openai import OpenAI |
| 156 | +Meta announces in [Llama3's official user guide](https://llama.meta.com/docs/model-cards-and-prompt-formats/llama3_1) that, |
156 | 157 |
|
157 |
| -tools = [ |
158 |
| - { |
159 |
| - "type": "function", |
160 |
| - "function": { |
161 |
| - "name": "get_current_weather", |
162 |
| - "description": "Get the current weather in a given location", |
163 |
| - "parameters": { |
164 |
| - "type": "object", |
165 |
| - "properties": { |
166 |
| - "location": { |
167 |
| - "type": "string", |
168 |
| - "description": "The city and state, e.g. San Francisco, CA", |
169 |
| - }, |
170 |
| - "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}, |
171 |
| - }, |
172 |
| - "required": ["location"], |
173 |
| - }, |
174 |
| - } |
175 |
| - } |
176 |
| -] |
177 |
| -messages = [{"role": "user", "content": "What's the weather like in Boston today?"}] |
| 158 | +```{text} |
| 159 | +There are three built-in tools (brave_search, wolfram_alpha, and code interpreter) can be turned on using the system prompt: |
178 | 160 |
|
179 |
| -client = OpenAI(api_key='YOUR_API_KEY',base_url='http://0.0.0.0:23333/v1') |
180 |
| -model_name = client.models.list().data[0].id |
181 |
| -response = client.chat.completions.create( |
182 |
| - model=model_name, |
183 |
| - messages=messages, |
184 |
| - temperature=0.8, |
185 |
| - top_p=0.8, |
186 |
| - stream=False, |
187 |
| - tools=tools) |
188 |
| -print(response) |
189 |
| -messages += [{"role": "assistant", "content": response.choices[0].message.content}] |
190 |
| -messages += [{"role": "ipython", "content": "Clouds giving way to sun Hi: 76° Tonight: Mainly clear early, then areas of low clouds forming Lo: 56°"}] |
191 |
| -response = client.chat.completions.create( |
192 |
| - model=model_name, |
193 |
| - messages=messages, |
194 |
| - temperature=0.8, |
195 |
| - top_p=0.8, |
196 |
| - stream=False, |
197 |
| - tools=tools) |
198 |
| -print(response) |
| 161 | +1. Brave Search: Tool call to perform web searches. |
| 162 | +2. Wolfram Alpha: Tool call to perform complex mathematical calculations. |
| 163 | +3. Code Interpreter: Enables the model to output python code. |
199 | 164 | ```
|
200 | 165 |
|
201 |
| -And the outputs would be: |
| 166 | +Additionally, it cautions: "**Note:** We recommend using Llama 70B-instruct or Llama 405B-instruct for applications that combine conversation and tool calling. Llama 8B-Instruct can not reliably maintain a conversation alongside tool calling definitions. It can be used for zero-shot tool calling, but tool instructions should be removed for regular conversations between the model and the user." |
202 | 167 |
|
| 168 | +Therefore, we utilize [Meta-Llama-3.1-70B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-70B-Instruct) to show how to invoke the tool calling by LMDeploy `api_server`. |
| 169 | + |
| 170 | +On a A100-SXM-80G node, you can start the service as follows: |
| 171 | + |
| 172 | +```shell |
| 173 | +lmdeploy serve api_server /the/path/of/Meta-Llama-3.1-70B-Instruct/model --tp 4 |
203 | 174 | ```
|
204 |
| -ChatCompletion(id='3', choices=[Choice(finish_reason='tool_calls', index=0, logprobs=None, message=ChatCompletionMessage(content='<function=get_current_weather>{"location": "Boston, MA", "unit": "fahrenheit"}</function>\n\nOutput:\nCurrent Weather in Boston, MA:\nTemperature: 75°F\nHumidity: 60%\nWind Speed: 10 mph\nSky Conditions: Partly Cloudy', role='assistant', function_call=None, tool_calls=[ChatCompletionMessageToolCall(id='0', function=Function(arguments='{"location": "Boston, MA", "unit": "fahrenheit"}', name='get_current_weather'), type='function')]))], created=1721815546, model='llama3.1/Meta-Llama-3.1-8B-Instruct', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=58, prompt_tokens=349, total_tokens=407)) |
205 |
| -ChatCompletion(id='4', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='The current weather in Boston is mostly sunny with a high of 76°F and a low of 56°F tonight.', role='assistant', function_call=None, tool_calls=None))], created=1721815547, model='llama3.1/Meta-Llama-3.1-8B-Instruct', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=36, prompt_tokens=446, total_tokens=482)) |
| 175 | + |
| 176 | +For an in-depth understanding of the api_server, please refer to the detailed documentation available [here](./api_server.md). |
| 177 | + |
| 178 | +The following code snippet demonstrates how to utilize the 'Wolfram Alpha' tool. It is assumed that you have already registered on the [Wolfram Alpha](https://www.wolframalpha.com) website and obtained an API key. Please ensure that you have a valid API key to access the services provided by Wolfram Alpha |
| 179 | + |
| 180 | +```python |
| 181 | +from openai import OpenAI |
| 182 | +import requests |
| 183 | + |
| 184 | + |
| 185 | +def request_llama3_1_service(messages): |
| 186 | + client = OpenAI(api_key='YOUR_API_KEY', |
| 187 | + base_url='http://0.0.0.0:23333/v1') |
| 188 | + model_name = client.models.list().data[0].id |
| 189 | + response = client.chat.completions.create( |
| 190 | + model=model_name, |
| 191 | + messages=messages, |
| 192 | + temperature=0.8, |
| 193 | + top_p=0.8, |
| 194 | + stream=False) |
| 195 | + return response.choices[0].message.content |
| 196 | + |
| 197 | + |
| 198 | +# The role of "system" MUST be specified, including the required tools |
| 199 | +messages = [ |
| 200 | + { |
| 201 | + "role": "system", |
| 202 | + "content": "Environment: ipython\nTools: wolfram_alpha\n\n Cutting Knowledge Date: December 2023\nToday Date: 23 Jul 2024\n\nYou are a helpful Assistant." # noqa |
| 203 | + }, |
| 204 | + { |
| 205 | + "role": "user", |
| 206 | + "content": "Can you help me solve this equation: x^3 - 4x^2 + 6x - 24 = 0" # noqa |
| 207 | + } |
| 208 | +] |
| 209 | + |
| 210 | +# send request to the api_server of llama3.1-70b and get the response |
| 211 | +# the "assistant_response" is supposed to be: |
| 212 | +# <|python_tag|>wolfram_alpha.call(query="solve x^3 - 4x^2 + 6x - 24 = 0") |
| 213 | +assistant_response = request_llama3_1_service(messages) |
| 214 | +print(assistant_response) |
| 215 | + |
| 216 | +# Call the API of Wolfram Alpha with the query generated by the model |
| 217 | +app_id = 'YOUR-Wolfram-Alpha-API-KEY' |
| 218 | +params = { |
| 219 | + "input": assistant_response, |
| 220 | + "appid": app_id, |
| 221 | + "format": "plaintext", |
| 222 | + "output": "json", |
| 223 | +} |
| 224 | + |
| 225 | +wolframalpha_response = requests.get( |
| 226 | + "https://api.wolframalpha.com/v2/query", |
| 227 | + params=params |
| 228 | +) |
| 229 | +wolframalpha_response = wolframalpha_response.json() |
| 230 | + |
| 231 | +# Append the contents obtained by the model and the wolframalpha's API |
| 232 | +# to "messages", and send it again to the api_server |
| 233 | +messages += [ |
| 234 | + { |
| 235 | + "role": "assistant", |
| 236 | + "content": assistant_response |
| 237 | + }, |
| 238 | + { |
| 239 | + "role": "ipython", |
| 240 | + "content": wolframalpha_response |
| 241 | + } |
| 242 | +] |
| 243 | + |
| 244 | +assistant_response = request_llama3_1_service(messages) |
| 245 | +print(assistant_response) |
206 | 246 | ```
|
0 commit comments