Abstract: Video summarization and captioning condense content by selecting keyframes and generating language descriptions, integrating both visual and textual perspectives. Existing video-and-language ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results