hi, i am prithiv! i am a graduate engineer [ug 2024], information technology, gcee focused on working in llm training and enhancements, improving multimodal ai capabilities.
Pinned Loading
-
Doc-VLMs-v2-Localization
Doc-VLMs-v2-Localization PublicDoc-VLMs-v2-Localization is a demo app for the Camel-Doc-OCR-062825 model, fine-tuned from Qwen2.5-VL-7B-Instruct for advanced document retrieval, extraction, and analysis. It enhances document und…
Python 1
-
FineTuning-SigLIP-2
FineTuning-SigLIP-2 PublicFine-Tuning SigLIP 2 for Single/Multi-Label Image Classification. Image classification vision-language encoder model fine-tuned for Image Classification Tasks
-
Qwen2.5-VL-Video-Understanding
Qwen2.5-VL-Video-Understanding PublicThe Qwen2.5-VL-7B-Instruct model is a multimodal AI model developed by Alibaba Cloud that excels at understanding both text and images. It's a Vision-Language Model (VLM) designed to handle various…
Python 1
-
OCR-ReportLab
OCR-ReportLab PublicA dedicated Colab notebooks to experiment (Nanonets OCR, Monkey OCR, OCRFlux 3B, Typhoo OCR 3B) On T4 GPU - free tier
-
Flux-LoRA-DLC
Flux-LoRA-DLC PublicExperience the power of the FLUX.1-dev diffusion model combined with a massive collection of 255+ community-created LoRAs! This Gradio application provides an easy-to-use interface to explore diver…
If the problem persists, check the GitHub status page or contact support.