Time–frequency recurrent transformer with diversity constraint for dense video captioning

P Li, P Zhang, T Wang, H Xiao - Information Processing & Management, 2023 - Elsevier
Describing a long video using multiple sentences, ie, dense video captioning, is a very
challenging task. Existing methods neglect the important fact that the actions of several
tempos (aka, frequencies) evolve with the time in video, and do not well handle the phrase
repetition issue. Therefore, we propose a Time-Frequency recurrent Transformer with
Diversity constraint (TFTD) for dense video captioning. Its basic idea is to develop a time–
frequency memory module, which not only stores the history of the past sentences and …
Showing the best result for this search. See all results