You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I recently noticed GenSON, a library for generating JSON schemas dynamically.
I think we should evaluate ways of providing tools for dynamic JSON schema generation, possibly by simply showcasing how to use GenSON or a similar library. If anyone has other ideas, they're very welcome!
Typically, the way we specify schemas in Outlines is to use Pydantic, like so:
For well structured programs, this is a great idea. Providing users a simple, clean interface in Pydantic enforces good practice. Users are required to provide a Pydantic model that can be used in a type-safe way everywhere in their application.
However, there are many cases where this can be problematic. Pydantic can make it difficult to program with Outlines when the schema must be modified in-place.
My simplest example is this:
classCalendarEvent(BaseModel):
id: str# Note the addition of the id fielddescription: strdate: datetime
This ID is stored in my database, or uniquely generated before the model generates the object.
When you give this to the model, it will make up an ID that may not be unique. What I would like to do instead is:
This will force the model to use the ID I provide, and I won't have to do any post-generation clean up to enforce a unique UUID.
This is difficult to do, currently. I wrote an example of how to dynamically create Pydantic models, but it is quite clunky and does not have a convenient user interface. I've included an example of this in the detail block at the end of the issue.
Other examples
Here's a few other cases where dynamic schema creation might be a useful user interface feature.
Function calling. Currently we do gnarly regular expressions, or set up mega-function calling objects. Here we can flexibly define functions that the model may choose from during runtime.
General runtime usage. I often run into cases where I need to change the schema in standard control flow, such as changing enums within larger classes.
Flexibility. Working with Pydantic dynamically is a pain in general. Pydantic is great when you have a fixed structure, but often you may wish to provide flexibly schemas conditional on a model response. Imaging that two disconnected systems send each other JSON -- if you build a schema from the message Alice sends to Bob, Bob can just replicate that schema and kick it back to Alice in a format Alice understands.
Simplicity. You don't always need internal Python objects that you get from Pydantic. I would often be happy with just a dict for throwaways, especially when I don't want to have a gigantic models/ directory packed with tiny Pydantic classes.
# importsfromtypingimportAnnotatedfromannotated_typesimportLenimportoutlinesfromtransformersimportAutoTokenizerfrompydanticimportBaseModel, Fieldfromrichimportprintfrompydanticimportcreate_model# Initialize the modelmodel_name="HuggingFaceTB/SmolLM2-135M-Instruct"model=outlines.models.transformers(
model_name,
device="auto",
)
# load tokenizer to apply chat templatetokenizer=AutoTokenizer.from_pretrained(model_name)
deftemplate(prompt):
templated=tokenizer.apply_chat_template([{"role": "user", "content": prompt}], tokenize=False, add_generation_prompt=True)
returntemplatedclassTask(BaseModel):
task: strdefLimitedList(
max_items: int=5,
min_items: int=0,
):
returncreate_model(
"LimitedList",
items=(list[Task], Field(max_length=max_items, min_length=min_items))
)
# Create the dynamic modellimited_class=LimitedList(
max_items=10,
min_items=9
)
# Make a list generator function. Takes a prompt and # returns a list of tasks.list_generator=outlines.generate.json(
model,
limited_class
)
prompt=f"""I'm making building a house. Please provide a list of tasks that need to be completed.Response format:{limited_class.model_json_schema()}"""# Prompt the modeltask_list=list_generator(template(prompt))
forideaintask_list.items:
print(f" - {idea.task}")
The text was updated successfully, but these errors were encountered:
This PR aims at integrating support of the `genson` package (in
`generate.json`) to be able to use dynamic json schema generation as
proposed in #1383.
I recently noticed GenSON, a library for generating JSON schemas dynamically.
I think we should evaluate ways of providing tools for dynamic JSON schema generation, possibly by simply showcasing how to use GenSON or a similar library. If anyone has other ideas, they're very welcome!
Typically, the way we specify schemas in Outlines is to use Pydantic, like so:
For well structured programs, this is a great idea. Providing users a simple, clean interface in Pydantic enforces good practice. Users are required to provide a Pydantic model that can be used in a type-safe way everywhere in their application.
However, there are many cases where this can be problematic. Pydantic can make it difficult to program with Outlines when the schema must be modified in-place.
My simplest example is this:
This ID is stored in my database, or uniquely generated before the model generates the object.
When you give this to the model, it will make up an ID that may not be unique. What I would like to do instead is:
This will force the model to use the ID I provide, and I won't have to do any post-generation clean up to enforce a unique UUID.
This is difficult to do, currently. I wrote an example of how to dynamically create Pydantic models, but it is quite clunky and does not have a convenient user interface. I've included an example of this in the detail block at the end of the issue.
Other examples
Here's a few other cases where dynamic schema creation might be a useful user interface feature.
The text was updated successfully, but these errors were encountered: