Abstract: Weakly supervised video anomaly detection aims to locate abnormal activities in untrimmed videos without the need for frame-level supervision. Prior work has utilized graph convolution ...
Video Dense Caption: PPLLaVA can effectively balance the content, state, and motion of both the foreground and background, while maintaining detail and accuracy. Multi-turn dialogue and reasoning: ...
A Model Context Protocol (MCP) server that provides a "prompts" primitive for managing and serving customizable prompt templates. This server allows you to create, organize, and serve prompt templates ...
Abstract: Pre-Trained vision-language models, like CLIP, make few-shot action recognition possible via text prompt. However, teaching scenarios are complex and CLIP has difficulties in understanding ...