LLaVA-MORE: A Comparative Study of LLMs and Visual Backbones for Enhanced Visual Instruction Tuning
-
Updated
Apr 24, 2025 - Python
LLaVA-MORE: A Comparative Study of LLMs and Visual Backbones for Enhanced Visual Instruction Tuning
Facial-Emotion-Detection-SigLIP2 is an image classification vision-language encoder model fine-tuned from google/siglip2-base-patch16-224
Augmented-Waste-Classifier-SigLIP2 is an image classification vision-language encoder model fine-tuned from google/siglip2-base-patch16-224
SigLIP2 is a vision-language encoder model fine-tuned from google/siglip2-base-patch16-224
Age-Classification-SigLIP2 is an image classification vision-language encoder model fine-tuned from google/siglip2-base-patch16-224 for a single-label classification task. It is designed to predict the age group of a person from an image using the SiglipForImageClassification architecture.
Human-Action-Recognition is an image classification vision-language encoder model fine-tuned from google/siglip2-base-patch16-224 for multi-class human action recognition. It uses the SiglipForImageClassification architecture to predict human activities from still images.
Anime-Classification-v1.0 is an image classification vision-language encoder model fine-tuned from google/siglip2-base-patch16-224 for a single-label classification task. It is designed to classify anime-related images using the SiglipForImageClassification architecture.
x-bot-profile-detection is a SigLIP2-based classification model designed to detect profile authenticity types on social media platforms (such as X/Twitter). It categorizes a profile image into four classes: bot, cyborg, real, or verified. Built on google/siglip2-base-patch16-224.
Watermark-Detection-SigLIP2 is a vision-language encoder model fine-tuned from google/siglip2-base-patch16-224 for binary image classification. It is trained to detect whether an image contains a watermark or not, using the SiglipForImageClassification architecture.
classify handwritten digits (0-9)
Fashion-Mnist-SigLIP2 is an image classification vision-language encoder model fine-tuned from google/siglip2-base-patch16-224 for a single-label classification task. It is designed to classify images into Fashion-MNIST categories using the SiglipForImageClassification architecture.
Fire-Detection-Siglip2 is an image classification vision-language encoder model fine-tuned from google/siglip2-base-patch16-224 for a single-label classification task. It is designed to detect fire, smoke, or normal conditions using the SiglipForImageClassification architecture.
Deepfake vs Real is a dataset designed for image classification, distinguishing between deepfake and real images.
Food-101-93M is a fine-tuned image classification model built on top of google/siglip2-base-patch16-224 using the SiglipForImageClassification architecture. It is trained to classify food images into one of 101 popular dishes, derived from the Food-101 dataset.
IndoorOutdoorNet is an image classification vision-language encoder model fine-tuned from google/siglip2-base-patch16-224 for a single-label classification task. It is designed to classify images as either Indoor or Outdoor using the SiglipForImageClassification architecture..
Clipart-126-DomainNet is an image classification vision-language encoder model fine-tuned from google/siglip2-base-patch16-224 for a single-label classification task. It is designed to classify clipart images into 126 domain categories using the SiglipForImageClassification architecture
Multilabel-GeoSceneNet is a vision-language encoder model fine-tuned from google/siglip2-base-patch16-224 for multi-label image classification. It is designed to recognize and label multiple geographic or environmental elements in a single image using the SiglipForImageClassification architecture.
Rice-Leaf-Disease is an image classification model fine-tuned from google/siglip2-base-patch16-224 for detecting and categorizing diseases in rice leaves. It is built using the SiglipForImageClassification architecture and helps in early identification of plant diseases for better crop management.
Coral-Health is an image classification vision-language encoder model fine-tuned from google/siglip2-base-patch16-224 for a single-label classification task. It is designed to classify coral reef images into two health conditions using the SiglipForImageClassification architecture.
Face-Mask-Detection is a binary image classification model based on google/siglip2-base-patch16-224, trained to detect whether a person is wearing a face mask or not. This model can be used in public health monitoring, access control systems, and workplace compliance enforcement.
Add a description, image, and links to the siglip2 topic page so that developers can more easily learn about it.
To associate your repository with the siglip2 topic, visit your repo's landing page and select "manage topics."