Caption2SceneGraph

A parser tool using large language model to parse the input caption into a scene graph

Overview

Caption2SceneGraph is a tool that converts natural language image descriptions into structured scene graphs using Large Language Models (LLMs). It extracts visual elements, relationships, and attributes from textual descriptions to create a comprehensive scene representation.

Features

Visual fact extraction from captions
Scene graph parsing with entity recognition
Attribute and relationship extraction

Installation

git clone https://github.com/yourusername/Caption2SceneGraph
cd Caption2SceneGraph
pip install -r requirements.txt

Finetuning Dataset

We collect 1k parser results from the deepseek-chat API. We upload the input text and output scene graph paired dataset here: Xinran0906/Text2SG. You can use this dataset to train your own parser.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
prompts		prompts
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
batch_inference.py		batch_inference.py
data_browser.py		data_browser.py
demo.py		demo.py
finetune.py		finetune.py
inference.py		inference.py
parser.py		parser.py
simple_inference.py		simple_inference.py
transfer_to_hf_format.py		transfer_to_hf_format.py
upload_model.py		upload_model.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Caption2SceneGraph

Overview

Features

Installation

Finetuning Dataset

About

Uh oh!

Releases

Packages

Languages

License

xin-ran-w/Caption2SceneGraph

Folders and files

Latest commit

History

Repository files navigation

Caption2SceneGraph

Overview

Features

Installation

Finetuning Dataset

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages