--- license: apache-2.0 base_model: - DeepGlint-AI/MLCD-Embodied-7B --- [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/multi-label-cluster-discrimination-for-visual/referring-expression-segmentation-on-refcocog)](https://paperswithcode.com/sota/referring-expression-segmentation-on-refcocog?p=multi-label-cluster-discrimination-for-visual) [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/multi-label-cluster-discrimination-for-visual/referring-expression-segmentation-on-refcoco-5)](https://paperswithcode.com/sota/referring-expression-segmentation-on-refcoco-5?p=multi-label-cluster-discrimination-for-visual) [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/multi-label-cluster-discrimination-for-visual/referring-expression-segmentation-on-refcoco-3)](https://paperswithcode.com/sota/referring-expression-segmentation-on-refcoco-3?p=multi-label-cluster-discrimination-for-visual) [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/multi-label-cluster-discrimination-for-visual/referring-expression-segmentation-on-refcocog-1)](https://paperswithcode.com/sota/referring-expression-segmentation-on-refcocog-1?p=multi-label-cluster-discrimination-for-visual) [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/multi-label-cluster-discrimination-for-visual/referring-expression-segmentation-on-refcoco-8)](https://paperswithcode.com/sota/referring-expression-segmentation-on-refcoco-8?p=multi-label-cluster-discrimination-for-visual) [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/multi-label-cluster-discrimination-for-visual/referring-expression-segmentation-on-refcoco-4)](https://paperswithcode.com/sota/referring-expression-segmentation-on-refcoco-4?p=multi-label-cluster-discrimination-for-visual) [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/multi-label-cluster-discrimination-for-visual/referring-expression-segmentation-on-refcoco-9)](https://paperswithcode.com/sota/referring-expression-segmentation-on-refcoco-9?p=multi-label-cluster-discrimination-for-visual) [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/multi-label-cluster-discrimination-for-visual/referring-expression-segmentation-on-refcoco)](https://paperswithcode.com/sota/referring-expression-segmentation-on-refcoco?p=multi-label-cluster-discrimination-for-visual) [![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/multi-label-cluster-discrimination-for-visual/referring-expression-segmentation-on-refcoco)](https://paperswithcode.com/sota/referring-expression-segmentation-on-refcoco?p=multi-label-cluster-discrimination-for-visual) ## RefCOCO Segmentation Evaluation: | Dataset | Split | MLCD-seg-7B | EVF-SAM | GLaMM | VisionLLM v2| LISA | | :-- | :-: | :-: | :-: | :-: | :-: | :-: | | RefCOCO | val | **83.6** | 82.4 | 79.5 | 79.2 | 74.9 | | RefCOCO | testA | **85.3** | 84.2 | 83.2 | 82.3 | 79.1 | | RefCOCO | testB | **81.5** | 80.2 | 76.9 | 77.0 | 72.3 | | RefCOCO+ | val | **79.4** | 76.5 | 72.6 | 68.9 | 65.1 | | RefCOCO+ | testA | **82.9** | 80.0 | 78.7 | 75.8 | 70.8 | | RefCOCO+ | testB | **75.6** | 71.9 | 64.6 | 61.8 | 58.1 | | RefCOCOg | val | **79.7** | 78.2 | 74.2 | 73.3 | 67.9 | | RefCOCOg | test | **80.5** | 78.3 | 74.9 | 74.8 | 70.6 | ## Evaluation If you just want to use this code, please refer to this sample below ```python from transformers import AutoModel, AutoTokenizer from PIL import Image model_path = "DeepGlint-AI/MLCD-Seg" # or use your local path mlcd_seg = AutoModel.from_pretrained( model_path, torch_dtype=torch.float16, trust_remote_code=True ).cuda() tokenizer = AutoTokenizer.from_pretrained(model_path, use_fast=False) # Assuming you have an image named test.jpg seg_img = Image.open("test.jpg").convert('RGB') seg_prompt = "Could you provide a segmentation mask for the right giraffe in this image?" pred_mask = model.seg(seg_img, seg_prompt, tokenizer, force_seg=False) ``` If you want to use this code measurement dataset (e.g. refcoco), then you need to use the following method ```python from transformers import AutoModel, AutoTokenizer from PIL import Image model_path = "DeepGlint-AI/MLCD-Seg" # or use your local path mlcd_seg = AutoModel.from_pretrained( model_path, torch_dtype=torch.float16, trust_remote_code=True ).cuda() tokenizer = AutoTokenizer.from_pretrained(model_path, use_fast=False) # Assuming you have an image named test.jpg seg_img = Image.open("test.jpg").convert('RGB') seg_prompt = "Could you provide a segmentation mask for the right giraffe in this image?" pred_mask = model.seg(seg_img, seg_prompt, tokenizer, force_seg=True) ``` ## Example output output ## Citations ``` @misc{mlcdseg_wukun, author = {Wu, Kun and Xie, Yin and Zhou, Xinyu and An, Xiang, and Deng, Jiankang, and Jie, Yu}, title = {MLCD-Seg}, year = {2025}, url = {https://github.com/deepglint/unicom/tree/main/downstream}, } ```