new

Get trending papers in your email inbox!

Subscribe

byAK and the research community

Mar 21

ProtoCLIP: Prototypical Contrastive Language Image Pretraining

Contrastive Language Image Pretraining (CLIP) has received widespread attention, since its learned representations can be transferred well to various downstream tasks. During the training process of the CLIP model, the InfoNCE objective aligns positive image-text pairs and separates negative ones. We show an underlying representation grouping effect during this process: the InfoNCE objective indirectly groups semantically similar representations together via randomly emerged within-modal anchors. Based on this understanding, in this paper, Prototypical Contrastive Language Image Pretraining (ProtoCLIP) is introduced to enhance such grouping by boosting its efficiency and increasing its robustness against the modality gap. Specifically, ProtoCLIP sets up prototype-level discrimination between image and text spaces, which efficiently transfers higher-level structural knowledge. Further, Prototypical Back Translation (PBT) is proposed to decouple representation grouping from representation alignment, resulting in effective learning of meaningful representations under large modality gap. The PBT also enables us to introduce additional external teachers with richer prior language knowledge. ProtoCLIP is trained with an online episodic training strategy, which makes it can be scaled up to unlimited amounts of data. We train our ProtoCLIP on Conceptual Captions and achieved an +5.81% ImageNet linear probing improvement and an +2.01% ImageNet zero-shot classification improvement. On the larger YFCC-15M dataset, ProtoCLIP matches the performance of CLIP with 33% of training time. Codes are available at https://github.com/megvii-research/protoclip.

Evidence for a Massive Protocluster in S255N

S255N is a luminous far-infrared source that contains many indications of active star formation but lacks a prominent near-infrared stellar cluster. We present mid-infrared through radio observations aimed at exploring the evolutionary state of this region. Our observations include 1.3mm continuum and spectral line data from the Submillimeter Array, VLA 3.6cm continuum and 1.3cm water maser data, and multicolor IRAC images from the Spitzer Space Telescope. The cometary morphology of the previously-known UCHII region G192.584-0.041 is clearly revealed in our sensitive, multi-configuration 3.6cm images. The 1.3mm continuum emission has been resolved into three compact cores, all of which are dominated by dust emission and have radii < 7000AU. The mass estimates for these cores range from 6 to 35 Msun. The centroid of the brightest dust core (SMA1) is offset by 1.1'' (2800 AU) from the peak of the cometary UCHII region and exhibits the strongest HC3N, CN, and DCN line emission in the region. SMA1 also exhibits compact CH3OH, SiO, and H2CO emission and likely contains a young hot core. We find spatial and kinematic evidence that SMA1 may contain further multiplicity, with one of the components coincident with a newly-detected H2O maser. There are no mid-infrared point source counterparts to any of the dust cores, further suggesting an early evolutionary phase for these objects. The dominant mid-infrared emission is a diffuse, broadband component that traces the surface of the cometary UCHII region but is obscured by foreground material on its southern edge. An additional 4.5 micron linear feature emanating to the northeast of SMA1 is aligned with a cluster of methanol masers and likely traces a outflow from a protostar within SMA1. Our observations provide direct evidence that S255N is forming a cluster of intermediate to high-mass stars.