Skip to content
Change the repository type filter

All

    Repositories list

    • WMAdapter

      Public
      A watermark plugin for latent diffusion models.
      0000Updated May 28, 2025May 28, 2025
    • The official code implementation of the paper "OmniConsistency: Learning Style-Agnostic Consistency from Paired Stylization Data."
      Python
      27810Updated May 27, 2025May 27, 2025
    • A curated list of recent diffusion models for video generation, editing, and various other applications.
      2644.5k01Updated May 26, 2025May 26, 2025
    • ShowUI

      Public
      [CVPR 2025] Open-source, End-to-end, Vision-Language-Action model for GUI Agent & Computer Use.
      Python
      Apache License 2.0
      851.3k80Updated May 22, 2025May 22, 2025
    • Out-of-the-box (OOTB) GUI Agent for Windows and macOS
      Python
      Apache License 2.0
      1571.6k306Updated May 21, 2025May 21, 2025
    • livecc

      Public
      LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale (CVPR 2025)
      Python
      2820741Updated May 20, 2025May 20, 2025
    • DoraCycle

      Public
      [CVPR 2025] DoraCycle: Domain-Oriented Adaptation of Unified Generative Model in Multimodal Cycles
      12320Updated May 13, 2025May 13, 2025
    • A curated list of recent robot learning papers incorporating diffusion models for robotics tasks.
      617500Updated May 1, 2025May 1, 2025
    • Exo2Ego-V

      Public
      Python
      Apache License 2.0
      04300Updated Apr 28, 2025Apr 28, 2025
    • Show-o

      Public
      [ICLR 2025] Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.
      Python
      Apache License 2.0
      621.4k432Updated Apr 28, 2025Apr 28, 2025
    • omg

      Public
      Open Multimodal Gathering workshop @ NUS
      JavaScript
      0000Updated Apr 28, 2025Apr 28, 2025
    • Code Implementation of "PhotoDoodle: Learning Artistic Image Editing from Few-Shot Pairwise Data"
      Python
      MIT License
      2639281Updated Apr 23, 2025Apr 23, 2025
    • VideoLLM-online: Online Video Large Language Model for Streaming Video (CVPR 2024)
      Python
      Apache License 2.0
      47471270Updated Apr 23, 2025Apr 23, 2025
    • FAR

      Public
      Code for: "Long-Context Autoregressive Video Modeling with Next-Frame Prediction"
      Python
      MIT License
      921000Updated Apr 23, 2025Apr 23, 2025
    • ROICtrl

      Public
      Code for [CVPR 2025] ROICtrl: Boosting Instance Control for Visual Generation
      Python
      010820Updated Apr 16, 2025Apr 16, 2025
    • Enable AI to control your PC. This repo includes the WorldGUI Benchmark and GUI-Thinker Agent Framework.
      Python
      66910Updated Apr 11, 2025Apr 11, 2025
    • 📖 A curated list of resources dedicated to hallucination of multimodal large language models (MLLM).
      2670110Updated Apr 9, 2025Apr 9, 2025
    • 📖 This is a repository for organizing papers, codes and other resources related to unified multimodal models.
      2655511Updated Apr 9, 2025Apr 9, 2025
    • Repository of GUI Action Narrator
      JavaScript
      01000Updated Apr 8, 2025Apr 8, 2025
    • VideoGUI

      Public
      [NeurIPS 2024 D&B] VideoGUI: A Benchmark for GUI Automation from Instructional Videos
      JavaScript
      23500Updated Apr 7, 2025Apr 7, 2025
    • SMS

      Public
      Balanced Image Stylization with Style Matching Score
      12900Updated Apr 2, 2025Apr 2, 2025
    • Official code of "LayerTracer: Cognitive-Aligned Layered SVG Synthesis via Diffusion Transformer"
      Python
      MIT License
      35040Updated Apr 1, 2025Apr 1, 2025
    • Official code of "MakeAnything: Harnessing Diffusion Transformers for Multi-Domain Procedural Sequence Generation"
      Python
      MIT License
      918230Updated Apr 1, 2025Apr 1, 2025
    • FQGAN

      Public
      FQGAN: Factorized Visual Tokenization and Generation
      Python
      Other
      25000Updated Mar 29, 2025Mar 29, 2025
    • MovieAgent: Automated Movie Generation via Multi-Agent CoT Planning
      Python
      2319570Updated Mar 26, 2025Mar 26, 2025
    • SAM-I2V

      Public
      Apache License 2.0
      0210Updated Mar 22, 2025Mar 22, 2025
    • LOVA3

      Public
      (NeurIPS 2024) Official PyTorch implementation of LOVA3
      Python
      28500Updated Mar 21, 2025Mar 21, 2025
    • Python
      66610Updated Mar 20, 2025Mar 20, 2025
    • [CVPR 2025] A Hierarchical Movie Level Dataset for Long Video Generation
      Python
      25700Updated Mar 16, 2025Mar 16, 2025
    • TPDiff

      Public
      TPDiff: Temporal Pyramid Video Diffusion Model
      21910Updated Mar 13, 2025Mar 13, 2025