logo
2
0
Login
LongCat-Image

Introduction

We introduce LongCat-Image-Edit, the image editing version of Longcat-Image. LongCat-Image-Edit supports bilingual (Chinese-English) editing, achieves state-of-the-art performance among open-source image editing models, delivering leading instruction-following and image quality with superior visual consistency.

LongCat-Image-Edit model

Key Features

  • 🌟 Superior Precise Editing: LongCat-Image-Edit supports various editing tasks, such as global editing, local editing, text modification, and reference-guided editing. It has strong semantic understanding capabilities and can perform precise editing according to instructions.
  • 🌟 Consistency Preservation: LongCat-Image-Edit has strong consistency preservation capabilities, specifically scrutinizes whether attributes in non-edited regions, such as layout, texture, color tone, and subject identity, remain invariant unless targeted by the instruction, is well demonstrated in multi-turn editing.
  • 🌟 Strong Benchmark Performance: LongCat-Image-Edit achieves state-of-the-art (SOTA) performance in image editing tasks while significantly improving model inference efficiency, especially among open-source image editing models.

🎨 Showcase

LongCat-Image-Edit gallery.

Quick Start

Installation

Clone the repo:

git clone --single-branch --branch main https://github.com/meituan-longcat/LongCat-Image cd LongCat-Image

Install dependencies:

# create conda environment conda create -n longcat-image python=3.10 conda activate longcat-image # install other requirements pip install -r requirements.txt python setup.py develop

Run Image Editing

[!CAUTION] Special Handling for Text Rendering

For both Text-to-Image and Image Editing tasks involving text generation, you must enclose the target text within quotes ("").

Reason: The tokenizer applies character-level encoding specifically to content found inside quotes. Failure to use explicit quotation marks will result in a significant degradation of text rendering quality.

import torch from PIL import Image from transformers import AutoProcessor from longcat_image.models import LongCatImageTransformer2DModel from longcat_image.pipelines import LongCatImageEditPipeline device = torch.device('cuda') checkpoint_dir = './weights/LongCat-Image-Edit' text_processor = AutoProcessor.from_pretrained( checkpoint_dir, subfolder = 'tokenizer' ) transformer = LongCatImageTransformer2DModel.from_pretrained( checkpoint_dir , subfolder = 'transformer', torch_dtype=torch.bfloat16, use_safetensors=True).to(device) pipe = LongCatImageEditPipeline.from_pretrained( checkpoint_dir, transformer=transformer, text_processor=text_processor, ) # pipe.to(device, torch.bfloat16) # Uncomment for high VRAM devices (Faster inference) pipe.enable_model_cpu_offload() # Offload to CPU to save VRAM (Required ~19 GB); slower but prevents OOM generator = torch.Generator("cpu").manual_seed(43) img = Image.open('assets/test.png') prompt = '将猫变成狗' image = pipe( img, prompt, negative_prompt='', guidance_scale=4.5, num_inference_steps=50, num_images_per_prompt=1, generator=generator ).images[0] image.save('./edit_example.png')

About

No description, topics, or website provided.