PinCLIP: Large-scale Foundational Multimodal Representation at Pinterest

This paper introduces PinCLIP, a large-scale foundational multimodal representation model for Pinterest that employs a novel hybrid Vision Transformer architecture and neighbor alignment objectives to overcome VLM integration challenges, resulting in significant improvements in multi-modal retrieval accuracy, cold-start content distribution, and overall user engagement.

Josh Beal, Eric Kim, Jinfeng Rao + 3 more2026-03-05💻 cs

Parallax to Align Them All: An OmniParallax Attention Mechanism for Distributed Multi-View Image Compression

The paper proposes ParaHydra, a novel distributed multi-view image compression framework featuring an OmniParallax Attention Mechanism and a Parallax Multi Information Fusion Module that adaptively aligns and integrates inter-view correlations, enabling it to significantly outperform state-of-the-art multi-view codecs in both bitrate efficiency and computational speed.

Haotian Zhang, Feiyue Long, Yixin Yu + 7 more2026-03-05💻 cs

Field imaging framework for morphological characterization of aggregates with computer vision: Algorithms and applications

This dissertation presents a comprehensive field imaging framework that leverages advanced computer vision algorithms, including 2D instance segmentation and an integrated 3D reconstruction-segmentation-completion approach, to overcome the limitations of traditional methods and enable accurate morphological characterization of construction aggregates across diverse field scenarios.

Haohang Huang2026-03-05🤖 cs.AI