A new technique teaches machine-learning models to identify specific actions in long videos without the need for human
MIT News Machine learning 11:02 am on May 29, 2024
MIT researchers developed a self-supervised learning approach for spatio-temporal grounding in videos, without the need for manual annotations or trimming, focusing on global and local representations. They created an uncut video benchmark to evaluate models and aim for automatic detection of misalignments between audio and text cues, extending their framework to include audio data.
1996-2024 all rights reserved. Privacy Policy. All trademarks and copyrights held by respective owners. |